【问题标题】:Is there a way to alternate fetching elements with selenium, or beautifulsoup4?有没有办法用 selenium 或 beautifulsoup4 交替获取元素?
【发布时间】:2019-09-23 00:27:50
【问题描述】:

有趣的问题。我正在用 selenium 抓取一个投注网站,然后用 bs4 处理。问题是,网站加载赔率信息的方式与加载球队名称的方式不同。例如:

London v Tokyo            2/1   4/1
Amsterdam v Helsinki      5/1   3/1

New York v California     7/1   10/1

当我拉取它并对其进行迭代时,结果如下:

Names = [London, Tokyo, Amsterdam, Helsinki]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]

赔率是以不同长度的块从上到下、从左到右加载的。这意味着当我尝试将名称和赔率拼接在一起时,它们不会匹配。

我的问题是,我该如何解决这个问题?我希望最终能将信息公布出来,所以球队名称后面是它的赔率:

Games = [London, 2/1, Tokyo, 4/1, Amsterdam, 5/1, Helsinki, 3/1, New York, 7/1, California, 10/1]

** 更新 ** 网址是:https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/ 如果您获得登录页面,则只需单击即可。然后是左侧面板的“电子竞技”,然后是中间页面的“所有比赛”。

代码:

from selenium import webdriver
from bs4 import BeautifulSoup

url = "https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/"
driver = webdriver.Chrome()
driver.get(url)

# Then i'm navigating to the "All Matches" page

soup = BeautifulSoup(driver.page_source, 'html.parser')
teams = driver.find_elements_by_class_name("sl-CouponParticipantWithBookCloses_Name")
odds_raw = driver.find_elements_by_class_name("gl-ParticipantOddsOnly_Odds")

odds = []
teams_text = []
new_teams = []
new_odds = []

for name in teams:
teams_text.append(name.text)

团队像块一样进来,例如:“伦敦对东京”。 因此,为了将团队名称分开,我迭代并拆分它们

for name in teams_text:
first, second = name.split(" v ")
new_teams.append(first)
new_teams.append(second)

然后我将收到的赔率转换成小数:

for odd in odds_raw:
odds.append(odd.text)
for odd in odds:
first, second = odd.split("/")
new_odd = (int(first) / int(second)) + 1
new_odds.append(round(new_odd, 2))

所以现在我有一个所有团队名称的列表,以及一个十进制奇数值的列表。这就是我的问题所在。 bet365 生成比赛赔率的方式是在每个赛区的不同长度的垂直块中。

如果赔率是这样的:

Division 1
London v Tokyo        1   2
Amsterdam v Helsinki  3   4
Division 2
New York v California 5   6
Division 3
Sydney v Brisbane     7   8
Bali v Singapore      9   10
Berlin v Paris        11  12

然后当我拉它们时,几率会像这样:

[1, 3, 2, 4, 5, 6, 7, 9, 11, 8, 10, 12]

在分区长度不等的情况下,我很难弄清楚如何处理它。

【问题讨论】:

  • 更新了链接和代码
  • 是的,在我将其更改为小数后,我希望将其存储为:[Gen.G, 1.44, Team Envy, 2.62] 但它不是这样的

标签: python beautifulsoup selenium-chromedriver


【解决方案1】:

您可以使用正则表达式来捕获元素。

import re
s = '''London v Tokyo 2/1 4/1 Amsterdam v Helsinki 5/1 3/1 New York v California 7/1 10/1'''
re.findall(r'(\w+)\s+v\s+(\w+)\s+(\d+/\d+)\s+(\d+/\d+)', s)

[('London', 'Tokyo', '2/1', '4/1'),
 ('Amsterdam', 'Helsinki', '5/1', '3/1'),
 ('York', 'California', '7/1', '10/1')]

【讨论】:

    【解决方案2】:

    您可以使用这样的for 循环来实现您想要的输出:

    Names = ["London", "Tokyo", "Amsterdam", "Helsinki","New York","California"]
    Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
    start_nmb = 1 
    
    for nmb, odd in enumerate(Odds):
        Names.insert(start_nmb, odd)
        start_nmb += 2
    

    输出:

    ['London', 2.0, 'Tokyo', 5.0, 'Amsterdam', 4.0, 'Helsinki', 3.0, 'New York', 7.0, 'California', 10.0]
    

    希望这会有所帮助!

    【讨论】:

    • 试过了,不幸的是它似乎没有用!如果您愿意阅读,我用更多信息更新了这个问题。感谢您的建议
    【解决方案3】:

    这是一个冗长的尝试方法。赔率的奇数行(由循环确定)进入团队 1(团队 1 v 团队 2 的左侧。偶数行进入团队 2。列表列表被展平。然后列表组合,如答案 here 所示@user942640 给候补成员。

    注意:这依赖于等长列表来返回准确的结果。

    import itertools
    from bs4 import BeautifulSoup as bs
    #your existing code to get to page and wait for presence of all elements
    soup = bs(driver.page_source, 'lxml')
    teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]
    
    i = 0
    team1 = []
    team2 = []
    
    for item in soup.select('.sl-MarketCouponValuesExplicit2'):
        if i % 2 == 0:
            team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
        else:
            team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
        i+=1
    
    team1 =  [item for sublist in team1 for item in sublist]
    team2 =  [item for sublist in team2 for item in sublist]
    teams = [item for sublist in teams for item in sublist]
    team_odds =  [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
    final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
    print(final)
    

    所以,类似(注意赔率不断更新):

    from selenium import webdriver
    import itertools
    from bs4 import BeautifulSoup as bs
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    driver.get('https://www.bet365.com/#/HO/')
    driver.get('https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/')
    WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sl-MarketCouponValuesExplicit2")))
    soup = bs(driver.page_source, 'lxml')
    teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]
    
    i = 0
    team1 = []
    team2 = []
    
    for item in soup.select('.sl-MarketCouponValuesExplicit2'):
        if i % 2 == 0:
            team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
        else:
            team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
        i+=1
    
    team1 =  [item for sublist in team1 for item in sublist]
    team2 =  [item for sublist in team2 for item in sublist]
    teams = [item for sublist in teams for item in sublist]
    
    team_odds =  [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
    final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
    print(final)
    

    【讨论】:

    • 哇。这真的很棒。一些不错的优雅解决方案也可以缩短我的代码,我会注意的。极好的!从中学到了很多,非常感谢:)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-01-26
    • 2010-11-26
    • 2012-06-02
    • 2015-01-06
    • 1970-01-01
    • 2011-10-20
    相关资源
    最近更新 更多