【问题标题】:How can I loop through to pages using selenium?如何使用 selenium 循环访问页面?
【发布时间】:2021-06-08 00:58:20
【问题描述】:

我正在尝试从 Oddsportal 抓取数据,但我的代码不完整。

如何循环浏览比赛和赛季的页面?

我刚开始使用 Selenium,对它很陌生。

我当前的代码是:

browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/soccer/england/premier-league/results/")

df = pd.read_html(browser.page_source, header=0)[0]

dateList = []
gameList = []
scoreList = []
home_odds = []
draw_odds = []
away_odds = []

for row in df.itertuples():
    if not isinstance(row[1], str):
        continue
    elif ':' not in row[1]:
        date = row[1].split('-')[0]
        continue
    time = row[1]
    dateList.append(date)
    gameList.append(row[2])
    scoreList.append(row[3])
    home_odds.append(row[4])
    draw_odds.append(row[5])
    away_odds.append(row[6])

result = pd.DataFrame({'date': dateList,
                       'game': gameList,
                       'score': scoreList,
                       'Home': home_odds,
                       'Draw': draw_odds,
                       'Away': away_odds})

【问题讨论】:

    标签: python web-scraping selenium-chromedriver


    【解决方案1】:

    你必须先创建一个for循环

    class GameData:
    
        def __init__(self):
            self.date = []
            self.time = []
            self.game = []
            self.score = []
            self.home_odds = []
            self.draw_odds = []
            self.away_odds = []
            self.country = []
            self.league = []
    
    
    def parse_data(url):
        browser.get(url)
        df = pd.read_html(browser.page_source, header=0)[0]
        html = browser.page_source
        soup = bs(html, "lxml")
        cont = soup.find('div', {'id': 'wrap'})
        content = cont.find('div', {'id': 'col-content'})
        content = content.find('table', {'class': 'table-main'}, {'id': 'tournamentTable'})
        main = content.find('th', {'class': 'first2 tl'})
        if main is None:
            return None
        count = main.findAll('a')
        country = count[1].text
        league = count[2].text
        game_data = GameData()
        game_date = None
        for row in df.itertuples():
            if not isinstance(row[1], str):
                continue
            elif ':' not in row[1]:
                game_date = row[1].split('-')[0]
                continue
            game_data.date.append(game_date)
            game_data.time.append(row[1])
            game_data.game.append(row[2])
            game_data.score.append(row[3])
            game_data.home_odds.append(row[4])
            game_data.draw_odds.append(row[5])
            game_data.away_odds.append(row[6])
            game_data.country.append(country)
            game_data.league.append(league)
        return game_data
    
    # input URLs here
    urls = {}
    
    if __name__ == '__main__':
    
        results = None
    
        for url in urls:
            game_data = parse_data(url)
            if game_data is None:
                continue
            result = pd.DataFrame(game_data.__dict__)
            if results is None:
                results = result
            else:
                results = results.append(result, ignore_index=True)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-11-05
      • 2020-07-01
      • 2019-04-03
      • 1970-01-01
      相关资源
      最近更新 更多