【问题标题】:Using python requests with google search在谷歌搜索中使用 python 请求
【发布时间】:2021-04-26 23:26:20
【问题描述】:

我是 python 的新手。 在 PyCharm 中我写了这段代码:

import requests
from bs4 import BeautifulSoup

response = requests.get(f"https://www.google.com/search?q=fitness+wear")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

我得到的是以下页面的 HTML,而不是搜索结果的 HTML

我在 pythonanywhere.com 上的脚本中使用了相同的代码,它运行良好。我已经尝试了很多我找到的解决方案,但结果总是一样的,所以现在我坚持下去了。

【问题讨论】:

标签: python beautifulsoup python-requests google-search


【解决方案1】:

我认为这应该可行:

import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    url = f"https://www.google.com/search?q=fitness+wear"
    headers = {
        "referer":"referer: https://www.google.com/",
        "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
        }
    s.post(url, headers=headers)
    response = s.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

它使用请求会话和发布请求来创建任何初始 cookie(对此不完全确定),然后允许您抓取。

【讨论】:

    【解决方案2】:

    如果您在浏览器中打开私人窗口并访问 google.com,您应该会看到相同的弹出窗口,提示您同意。这是因为您没有发送会话 cookie。

    你有不同的选择来解决这个问题。 一种是直接发送您可以在网站上观察到的 cookie,如下所示:

    import requests
    cookies = {"CONSENT":"YES+shp.gws-20210330-0-RC1.de+FX+412", ...}
    
    resp = request.get(f"https://www.google.com/search?q=fitness+wear",cookies=cookies)
    

    @Dimitriy Kruglikov 使用的解决方案更简洁,使用会话是与网站建立持久会话的好方法。

    【讨论】:

      【解决方案3】:

      Google 不会阻止您,您仍然可以从 HTML 中提取数据。

      使用 cookie 不是很方便,使用 session 和 post 和 get 请求会导致更大的流量。

      您可以使用decompose()extract() BS4 方法删除此弹出窗口:

      • annoying_popup.decompose() 将彻底销毁它及其内容。 Documentation.

      • annoying_popup.extract() 将创建另一棵 html 树:一棵植根于您用于解析文档的 BeautifulSoup 对象,另一棵植根于提取的标签。 Documentation.

      之后,您可以刮取所需的所有内容,并且无需删除它。

      看到这个Organic Results extraction我最近做了。它从 Google 搜索结果中抓取标题、摘要和链接。


      或者,您可以使用来自 SerpApi 的 Google Search Engine Results API。查看Playground

      代码和example in online IDE

      from serpapi import GoogleSearch
      import os
      
      params = {
        "engine": "google",
        "q": "fus ro dah",
        "api_key": os.getenv("API_KEY"),
      }
      
      search = GoogleSearch(params)
      results = search.get_dict()
      
      for result in results['organic_results']:
        print(f"Title: {result['title']}\nSnippet: {result['snippet']}\nLink: {result['link']}\n")
      

      输出:

      Title: Skyrim - FUS RO DAH (Dovahkiin) HD - YouTube
      Snippet: I looked around for a fan made track that included Fus Ro Dah, but the ones that I found were pretty bad - some ...
      Link: https://www.youtube.com/watch?v=JblD-FN3tgs
      
      Title: Unrelenting Force (Skyrim) | Elder Scrolls | Fandom
      Snippet: If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...
      Link: https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)
      
      Title: Fus Ro Dah | Know Your Meme
      Snippet: Origin. "Fus Ro Dah" are the words for the "unrelenting force" thu'um shout in the game Elder Scrolls V: Skyrim. After reaching the first town of ...
      Link: https://knowyourmeme.com/memes/fus-ro-dah
      
      Title: Fus ro dah - Urban Dictionary
      Snippet: 1. A dragon shout used in The Elder Scrolls V: Skyrim. 2.An international term for oral sex given by a female. ex.1. The Dragonborn yelled "Fus ...
      Link: https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah
      

      部分 JSON:

      "organic_results": [
        {
          "position": 1,
          "title": "Unrelenting Force (Skyrim) | Elder Scrolls | Fandom",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)",
          "displayed_link": "https://elderscrolls.fandom.com › wiki › Unrelenting_F...",
          "snippet": "If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: \"Fus Rah Do\" instead of the proper \"Fus Ro Dah.\" ...",
          "sitelinks": {
            "inline": [
              {
                "title": "Location",
                "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Location"
              },
              {
                "title": "Effect",
                "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Effect"
              },
              {
                "title": "Usage",
                "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Usage"
              },
              {
                "title": "Word Wall",
                "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Word_Wall"
              }
            ]
          },
          "cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:K3LEBjvPps0J:https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)+&cd=17&hl=en&ct=clnk&gl=us"
        }
      ]
      

      免责声明,我为 SerpApi 工作。

      【讨论】: