【问题标题】:Webscraping columns from wikipedia table从维基百科表中抓取列
【发布时间】:2020-09-18 11:41:21
【问题描述】:

尝试从以下维基百科页面获取 Bowler 和 Team 列: https://en.wikipedia.org/wiki/List_of_bowlers_who_have_taken_300_or_more_wickets_in_Test_cricket

能够使用我的代码获得投球手专栏,但团队专栏证明很困难。可能是那里的板球迷之一,但欢迎任何帮助!

这是我的代码:

import requests
from bs4 import BeautifulSoup

wiki = "https://en.wikipedia.org/wiki/List_of_bowlers_who_have_taken_300_or_more_wickets_in_Test_cricket"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url, "lxml")

my_table = soup.find("table", {"class":"wikitable sortable plainrowheaders"})



bowler = []
team = []

for row in my_table.find_all("tr")[1:]:
    bowler_cell = row.find_all("a")[0]
    bowler.append(bowler_cell.text)
print(bowler)
for row in my_table.find_all("td"):
    team_cell = row.find_all("a")[0]
    team.append(team_cell.text)
print(team)

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    给你-

    import requests
    from bs4 import BeautifulSoup
    
    wiki = "https://en.wikipedia.org/wiki/List_of_bowlers_who_have_taken_300_or_more_wickets_in_Test_cricket"
    website_url = requests.get(wiki).text
    soup = BeautifulSoup(website_url, "lxml")
    
    my_table = soup.find("table", {"class":"wikitable sortable plainrowheaders"})
    
    
    i=0
    bowler = []
    team = []
    
    x = my_table.select('a[title]')
    
    while i < len(x):
        if i < 11:
            i=i+1
            pass
        else :
            bowler.append(x[i].text)
            team.append(x[i+1].text)
            i = i+2 
    print(bowler)
    print(team)
    

    【讨论】:

    • 谢谢 - 确实如此。如果您或其他任何人知道,仍然希望找到能够独立抓取团队专栏的解决方案(即使只是为了我自己的学习)。
    • 第二个代码专门处理您要求的内容。
    • 很高兴看到不同的方法,但确实如此 - 谢谢!
    猜你喜欢
    • 1970-01-01
    • 2019-07-20
    • 2019-05-24
    • 2016-09-08
    • 2020-07-20
    • 1970-01-01
    • 1970-01-01
    • 2017-04-30
    • 2020-07-16
    相关资源
    最近更新 更多