正如提到的另一个答案,这是因为没有指定user-agent。默认的 requests user-agent 是 python-requests,因此 Google 会阻止请求,因为它知道这是一个机器人而不是“真正的”用户访问。
User-agent 通过将此信息添加到HTTP request headers 来伪造用户访问。可以通过custom headers(check what's yours user-agent)来实现:
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get("YOUR_URL", headers=headers)
另外,为了得到更准确的结果你可以通过URL parameters:
params = {
"q": "samurai cop, what does katana mean", # query
"gl": "in", # country to search from
"hl": "en" # language
# other parameters
}
requests.get("YOUR_URL", params=params)
代码和full example in the online IDE(另一个答案的代码会因为CSS选择器更改而引发错误):
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "samurai cop what does katana mean",
"gl": "in",
"hl": "en"
}
html = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.tF2Cxc'):
title = result.select_one('.DKV0Md').text
link = result.select_one('.yuRUbf a')['href']
print(f'{title}\n{link}\n')
-------
'''
Samurai Cop - He speaks fluent Japanese - YouTube
https://www.youtube.com/watch?v=paTW3wOyIYw
Samurai Cop - What does "katana" mean? - Quotes.net
https://www.quotes.net/mquote/1060647
Samurai Cop (1991) - Mathew Karedas as Joe Marshall - IMDb
https://www.imdb.com/title/tt0130236/characters/nm0360481
...
'''
或者,您可以使用来自 SerpApi 的Google Organic Results API 来实现相同的目的。这是一个带有免费计划的付费 API。
您的情况的不同之处在于,您只需要迭代结构化 JSON 并快速获取您想要的数据,而不是弄清楚为什么某些事情不能正常工作,然后随着时间的推移维护解析器。
要集成的代码:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "samurai cop what does katana mean",
"hl": "en",
"gl": "in",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(result['title'])
print(result['link'])
print()
------
'''
Samurai Cop - He speaks fluent Japanese - YouTube
https://www.youtube.com/watch?v=paTW3wOyIYw
Samurai Cop - What does "katana" mean? - Quotes.net
https://www.quotes.net/mquote/1060647
...
'''
免责声明,我为 SerpApi 工作。