您需要包含所有所需数据的select_one() 元素(容器)并检查if 它是否存在,如果存在,则抓取数据。
确保您使用user-agent 充当“真正的”用户访问,否则您的请求可能会被阻止,或者您会收到带有不同选择器的不同 HTML。 Check what's your user-agent.
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
params = {
"q": "caracas arepa bar google",
"gl": "us"
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
# if right side knowledge graph is present -> parse the data.
if soup.select_one(".liYKde"):
place_name = soup.select_one(".PZPZlf.q8U8x span").text
place_type = soup.select_one(".YhemCb+ .YhemCb").text
place_reviews = soup.select_one(".hqzQac span").text
place_rating = soup.select_one(".Aq14fc").text
print(place_name, place_type, place_reviews, place_rating, sep="\n")
# output:
'''
Caracas Arepa Bar
Venezuelan restaurant
1,123 Google reviews
4.5
'''
或者,您可以使用来自 SerpApi 的 Google Knowledge Graph API 来实现相同的目的。这是一个带有免费计划的付费 API。
最大的不同是你不需要弄清楚如何解析数据、增加请求数量、绕过谷歌和其他搜索引擎的屏蔽。
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "caracas arepa bar place",
"hl": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
print(json.dumps([results["knowledge_graph"]], indent=2))
# part of the output:
'''
[
{
"title": "Caracas Arepa Bar",
"type": "Venezuelan restaurant",
"place_id": "ChIJVcQ2ll9ZwokRwmkvsArPXyo",
"website": "http://caracasarepabar.com/",
"description": "Arepa specialist offering creative, low-priced renditions of the Venezuelan corn-flour staple.",
"local_map": {
"image": "https://www.google.com/maps/vt/data=TF2Rd51PtEnU2M3pkZHYHKdSwhMDJ_ZwRfg0vfwlDRAmv1u919sgFl8hs_lo832ziTWxCZM9BKECs6Af-TA1hh0NLjuYAzOLFA1-RBEmj-8poygymcRX2KLNVTGGZZKDerZrKW6fnkONAM4Ui-BVN8XwFrwigoqqxObPg8bqFIgeM3LPCg",
"link": "https://www.google.com/maps/place/Caracas+Arepa+Bar/@40.7131972,-73.9574167,15z/data=!4m2!3m1!1s0x0:0x2a5fcf0ab02f69c2?sa=X&hl=en",
"gps_coordinates": {
"latitude": 40.7131972,
"longitude": -73.9574167,
"altitude": 15
}
} ... much more results including place images, popular times, user reviews.
}
]
'''
免责声明,我为 SerpApi 工作。