【发布时间】:2021-02-17 11:17:09
【问题描述】:
我正在尝试遍历列表“公司”,为其中的每个元素启动谷歌搜索,抓取结果,并将谷歌结果附加到每个元素。
公司变量就是这样,由 895 个列表组成
company = [['24/7 CUSTOMER Private Limited'], ['3 K TECHNOLOGIES Limited'], ['3I INFOTECH B P O Limited'], ['3I INFOTECH CONSULTANCY SERVICES Limited'], ['3I INFOTECH Limited'], ['4D CORPORATION Private Limited'], ['8K MILES SOFTWARE SERVICES Limited'], ['A B P Private Limited']...]]
我希望输出是
[['24/7 CUSTOMER Private Limited', New Dehli India], ['3 K TECHNOLOGIES Limited', Palo Alto United States], ['3I INFOTECH B P O Limited', New Dehli India], ['3I INFOTECH CONSULTANCY SERVICES Limited', New York United States], ['3I INFOTECH Limited', New York United States], ['4D CORPORATION Private Limited', Mumbai India], ['8K MILES SOFTWARE SERVICES Limited', New Dehli India ], ['A B P Private Limited', New Dehli India]...]]
这是一个以公司名称为参数并输出其抓取结果的函数
def scrape(row):
query = "https://www.google.com/search?q="+ row + " headquarters"
r = requests.get(query)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'html.parser')
cleanr = re.compile('<.*?>')
snippett = re.sub(cleanr, '', str(soup.find_all('div', attrs={'class':'BNeawe s3v9rd AP7Wnd'})[0]))
return snippett
然后通过遍历公司列表并附加结果来调用函数
for lst in company():
for row in lst():
hq_result = scrape(row)
row.append(hq_result)
出现此错误:
IndexError: list index out of range
【问题讨论】:
-
好像
soup.find_all()返回空列表。
标签: python list web-scraping iteration