确保您使用user-agent 伪造真实用户访问,否则可能会导致来自 Google 的请求被阻止。 List 的用户代理。
要从页面中直观地选择元素,您可以使用SelectorGadgets Chrome 扩展程序来抓取 CSS 选择器。
代码和example in online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get('https://www.google.com/search?q=simens', headers=headers).text
soup = BeautifulSoup(response, 'lxml')
title = soup.select_one('.SPZz6b h2').text
subtitle = soup.select_one('.wwUB2c span').text
website = soup.select_one('.ellip .ellip').text
snippet = soup.select_one('.Uo8X3b+ span').text
print(f'{title}\n{subtitle}\n{website}\n{snippet}')
输出:
Siemens
Automation company
siemens.com
Siemens AG is a German multinational conglomerate company headquartered in Munich and the largest industrial manufacturing company in Europe with branch offices abroad.
或者,您可以使用来自 SerpApi 的 Google Search Engine Results API。这是一个带有免费计划的付费 API。
要集成的代码:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "simens",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
title = results["knowledge_graph"]["title"]
subtitle = results["knowledge_graph"]["type"]
website = results["knowledge_graph"]["website"]
snippet = results["knowledge_graph"]["description"]
print(f'{title}\n{subtitle}\n{website}\n{snippet}')
输出:
Siemens
Automation company
http://www.siemens.com/
Siemens AG is a German multinational conglomerate company headquartered in Munich and the largest industrial manufacturing company in Europe with branch offices abroad.
免责声明,我在 SerpApi 工作。