【发布时间】:2019-11-24 07:50:00
【问题描述】:
我正在尝试仅使用 bs4 隔离球员所效力球队的“职业历史”列表 - NFL Qbs 表格的部分:
我想要的输出是:
['St. Louis Rams (2005–2006)', 'Cincinnati Bengals (2007–2008)', 'Buffalo Bills (2009–2012)', 'Tennessee Titans (2013)', 'Houston Texans (2014)', 'New York Jets (2015–2016)', 'Tampa Bay Buccaneers (2017–2018)', 'Miami Dolphins (2019–present)']
我的代码是:
url = 'https://en.wikipedia.org/wiki/Ryan_Fitzpatrick'
table = BeautifulSoup(player_wiki.text , 'html.parser')
for tr in table.find('tbody').find_all('ul'):
v = [li.text for li in tr.find_all('li')]
print(v)
当前输出:
['St. Louis Rams (2005–2006)', 'Cincinnati Bengals (2007–2008)', 'Buffalo Bills (2009–2012)', 'Tennessee Titans (2013)', 'Houston Texans (2014)', 'New York Jets (2015–2016)', 'Tampa Bay Buccaneers (2017–2018)', 'Miami Dolphins (2019–present)']
['Ivy League Player of the Year (2004)', 'First-team All–Ivy League (2004)', 'George H. “Bulger” Lowe Award (2004)']
我确定这是我的外循环的“ul”标签。如何缩小我的 find_all() 的范围以防止不需要的数据?有小费吗?我是网络抓取的新手。
【问题讨论】:
标签: python web-scraping beautifulsoup