【问题标题】:How to get certain text from a url links如何从 url 链接中获取某些文本
【发布时间】:2016-06-10 22:09:28
【问题描述】:

所以我试图在每个团队的 url 页面上的统计框页面中获取所有统计信息。页面外观的一个示例位于我在下面放置的超链接上。如果是这样,我想打印出来;

月份:获胜百分比 月份:获胜百分比 所有时间:win%

但我不确定如何编写该代码,因为我在 main 中编写的最后一段代码给了我一个错误。

http://www.gosugamers.net/counterstrike/teams/16448-nasty-gravy-runners

    import time
    import requests
    from bs4 import BeautifulSoup


    def get_all(url, base):  # Well called it will print all the team links
        r = requests.get(url)
        page = r.text

        soup = BeautifulSoup(page, 'html.parser')

        for team_links in soup.select('div.details h3 a'):
            members = int(team_links.find_next('th', text='Members:').find_next_sibling('td').text.strip().split()[0])
            if members < 5:
                continue
            yield base + team_links['href']

        next_page = soup.find('div', {'class': 'pages'}).find('span', text='Next')


        while next_page:
            # Gives the server a break
            time.sleep(0.2)

            r = requests.get(BASE_URL + next_page.find_previous('a')['href'])
            page = r.text
            soup = BeautifulSoup(page)
            for team_links in soup.select('div.details h3 a'):
                yield BASE_URL + team_links['href']
            next_page = soup.find('div', {'class': 'pages'}).find('span', text='Next')


    if __name__ == '__main__':

        BASE_URL = 'http://www.gosugamers.net'
        URL = 'http://www.gosugamers.net/counterstrike/teams'

        for links in get_all(URL, BASE_URL): # When run it will generate all the links for all the teams
           r = requests.get(links)
           page = r.content
           soup = BeautifulSoup(page)

           for statistics in soup.select('div.statistics tr'):
               win_rate = int(statistics.find('th', text='Winrate:').find_next_sibling('td'))
               print(win_rate)

【问题讨论】:

  • 你到底想得到什么?

标签: python-3.x web-scraping beautifulsoup


【解决方案1】:

不确定您到底想要什么,但这将获得所有团队统计数据:

from bs4 import BeautifulSoup, Tag
import requests

soup = BeautifulSoup(requests.get("http://www.gosugamers.net/counterstrike/teams/16448-nasty-gravy-runners").content)

table = soup.select_one("table.stats-table")
head1 = [th.text.strip() for th in table.select("tr.header th") if th.text]
head2 = [th.text.strip() for th in table.select_one("tr + tr") if isinstance(th, Tag)]
scores = [th.text.strip() for th in table.select_one("tr + tr + tr") if isinstance(th, Tag)]

print(head1, head2, scores)

输出:

([u'Jun', u'May', u'All time'], [u'Winrate:', u'0%', u'0%', u'0%'], [u'Matches played:', u'0 / 0 / 0', u'0 / 0 / 0', u'0 / 0 / 0'])

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-19
    • 2012-05-25
    • 2022-01-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多