【发布时间】:2017-07-27 10:13:30
【问题描述】:
我已经为一个关键字准备好了代码并且它工作正常。下一个问题是我想对 10 个不同的关键字进行抓取,并将它们保存在一个 csv 文件中,关键字名称位于列/行上。我认为我们可以将 csv 文件作为输入,它会一一挑选关键字并进行抓取。这是代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
base_url = "http://www.amazon.in/s/ref=sr_pg_2?
rh=n%3A4772060031%2Ck%3Ahelmets+for+men&keywords=helmets+for+men&ie=UTF8"
#excluding page from base_url for further adding
res = []
for page in range(1,3):
request = requests.get(base_url + '&page=' + str(page), headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}) # here adding page
if request.status_code == 404: #added just in case of error
break
soup = BeautifulSoup(request.content, "lxml")
for url in soup.find_all('li', class_ = 's-result-item'):
res.append([url.get('data-asin'), url.get('id')])
df = pd.DataFrame(data=res, columns=['Asin', 'Result'])
df.to_csv('hel.csv')
【问题讨论】:
标签: python python-3.x pandas beautifulsoup