【发布时间】:2021-04-26 05:30:27
【问题描述】:
我正在尝试从网站https://www.programmableweb.com/category/all/apis 下方提取班级信息表格。我的代码适用于除https://www.programmableweb.com/category/all/apis?page=2092 之外的所有页面。
from bs4 import BeautifulSoup
import requests
url = 'https://www.programmableweb.com/category/all/apis?page=2092'
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
apis = soup.find_all('tr',{'class':['odd views-row-first', 'odd','even','even views-row-last']})
print(apis)
在 2092 页面上,我仅获得以下 1 个班级的信息
[<tr class="odd views-row-first views-row-last"><td class="views-field views-field-pw-version-title"> <a href="/api/inkling">Inkling API</a><br/></td><td class="views-field views-field-search-api-excerpt views-field-field-api-description hidden-xs visible-md visible-sm col-md-8"> Our REST API allows you to replicate much of the functionality in our hosted marketplace solution to build custom widgets and stock tickers for your Intranet, create custom reports, add trading...</td><td class="views-field views-field-field-article-primary-category"> <a href="/category/financial">Financial</a></td><td class="views-field views-field-pw-version-links"> <a href="/api/inkling-rest-api">REST v0.0</a></td></tr>]
对于任何其他页面(如https://www.programmableweb.com/category/all/apis?page=2091),我会获得有关所有课程的信息。 HTML 结构在所有页面中似乎都相似。
【问题讨论】:
-
它对我来说工作正常,也许尝试在每个请求之间睡几秒钟
-
@AhmedSoliman,你能粘贴你得到的打印输出吗?我是这里的新手,您也可以帮忙解决您建议的睡眠代码
标签: html web-scraping beautifulsoup