【发布时间】:2017-06-27 02:32:41
【问题描述】:
我想从网站的第二页获取 html。我习惯了显示页码的 url 并让我通过操纵它来抓取多个页面。
my_url = 'https://www.bodybuilding.com/exercises/finder/lookup/filter/muscle/id/1/muscle/chest'
headers = {'referer':my_url,
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'x-requested-with':'XMLHttpRequest'}
payload = {'orderByField':'exerciseName',
'orderByDirection':'ASC',
'page':30}
params = {'muscleID':1,
'exerciseTypeID':[2,6,4,7,1,3,5],
'equipmentID':[9,14,2,10,5,6,4,15,1,8,11,3,7],
'mechanicTypeID':[1,2,11]}
r = requests.post(my_url, data=payload, headers=headers, params=params, verify=True)
soup = bs(r.text, 'html.parser')
【问题讨论】:
标签: python web-scraping python-requests