【发布时间】:2021-07-19 06:52:23
【问题描述】:
我想爬取https://www.ketto.org/crowdfunding/fundraisers。我找到了一个 url https://nn2uorrizx-dsn.algolia.net/1/indexes/*/queries ,我可以从中获取来自 post 请求的数据,但我得到的响应是 400 而不是 200。请帮我抓取数据!
这是我的代码:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0',
'Accept': 'application/json',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/x-www-form-urlencoded',
'Origin': 'https://www.ketto.org',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'cross-site',
'Referer': 'https://www.ketto.org/',
'Connection': 'keep-alive',
}
params = (
('x-algolia-agent', 'Algolia for JavaScript (3.35.1); Browser (lite); angular (8.2.14); angular-instantsearch (3.0.0-beta.4); instantsearch.js (3.7.0); JS Helper (2.28.1)'),
('x-algolia-application-id', 'NN2UORRIZX'),
('x-algolia-api-key', 'b2caa1b0589e8db9398d5fe2a40bbaed'),
)
data = [
('{requests:[{indexName:fundraiser_prod,params:query', ''),
('hitsPerPage', '9'),
('hitsPerPage', '1'),
('hitsPerPage', '1'),
('hitsPerPage', '1'),
('maxValuesPerFacet', '10'),
('maxValuesPerFacet', '10'),
('maxValuesPerFacet', '10'),
('maxValuesPerFacet', '10'),
('page', '1'),
('page', '0'),
('page', '0'),
('page', '0'),
('highlightPreTag', '__ais-highlight__'),
('highlightPreTag', '__ais-highlight__'),
('highlightPreTag', '__ais-highlight__'),
('highlightPreTag', '__ais-highlight__'),
('highlightPostTag', '__/ais-highlight__'),
('highlightPostTag', '__/ais-highlight__'),
('highlightPostTag', '__/ais-highlight__'),
('highlightPostTag', '__/ais-highlight__'),
('facets', '["cause.label","tags","address"]'),
('facets', '["cause.label"]'),
('facets', '["tags"]'),
('facets', '["address"]'),
('tagFilters', ''),
('tagFilters', ''),
('tagFilters', ''),
('tagFilters', ''),
('facetFilters', '[["cause.label:"],["tags:"],["address:"]]},{indexName:fundraiser_prod,params:query='),
('facetFilters', '[["tags:"],["address:"]]},{indexName:fundraiser_prod,params:query='),
('facetFilters', '[["cause.label:"],["address:"]]},{indexName:fundraiser_prod,params:query='),
('facetFilters', '[["cause.label:"],["tags:"]]}]}'),
('attributesToRetrieve', '[]'),
('attributesToRetrieve', '[]'),
('attributesToRetrieve', '[]'),
('attributesToHighlight', '[]'),
('attributesToHighlight', '[]'),
('attributesToHighlight', '[]'),
('attributesToSnippet', '[]'),
('attributesToSnippet', '[]'),
('attributesToSnippet', '[]'),
('analytics', 'false'),
('analytics', 'false'),
('analytics', 'false'),
('clickAnalytics', 'false'),
('clickAnalytics', 'false'),
('clickAnalytics', 'false'),
]
response = requests.post('https://nn2uorrizx-dsn.algolia.net/1/indexes/*/queries', headers=headers, params=params, data=data)
print(response)
如果有任何其他建议使用 python requests 模块抓取https://www.ketto.org/crowdfunding/fundraisers,那么我很乐意在这里。提前谢谢你。
【问题讨论】:
标签: python-3.x post web-scraping python-requests web-crawler