硒
试试这个(硒):
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
import time
res =[]
url = 'http://www.nowgoal.group/nba/Schedule.aspx?f=ft2&date=2020-07-29'
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
driver.get(url)
time.sleep(2)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page, 'html.parser')
span = soup.select_one('span#live')
tables = span.select('table')
for table in tables:
if table.get('class'):
competition = table.select_one('a b font').text
else:
for home, away in zip(table.select('tr.b1')[0::2], table.select('tr.b1')[1::2]):
res.append([f"{competition}",
f"{home.select_one('td a').text}",
f"{away.select_one('td a').text}",
f"{home.select_one('td.red').text}",
f"{away.select_one('td.red').text}",
f"{home.select_one('td.odds1').text}",
f"{away.select_one('td.odds1').text}",
f"{home.select('td font')[0].text}/{home.select('td font')[1].text}",
f"{away.select('td font')[0].text}/{away.select('td font')[1].text}",
f"{home.select('td div a')[-1].get('href')}"])
df = pd.DataFrame(res, columns=['competition',
'home',
'away',
'home score',
'away score',
'home odds',
'away odds',
'home ht',
'away ht',
'odds'
])
print(df.to_string())
df.to_csv('Res.csv')
打印:
competition home away home score away score home odds away odds home ht away ht odds
0 National Basketball Association Portland Trail Blazers Oklahoma City Thunder 120 131 2.72 1.45 50/70 63/68 http://data.nowgoal.group/OddsCompBasket/387520.html
1 National Basketball Association Houston Rockets Boston Celtics 137 112 1.49 2.58 77/60 60/52 http://data.nowgoal.group/OddsCompBasket/387521.html
2 National Basketball Association Philadelphia 76ers Dallas Mavericks 115 118 2.04 1.76 39/64 48/55 http://data.nowgoal.group/OddsCompBasket/387522.html
3 Women’s National Basketball Association Connecticut Sun Washington Mystics 89 94 2.28 1.59 52/37 48/46 http://data.nowgoal.group/OddsCompBasket/385886.html
4 Women’s National Basketball Association Chicago Sky Los Angeles Sparks 96 78 2.72 1.43 40/56 36/42 http://data.nowgoal.group/OddsCompBasket/385618.html
5 Women’s National Basketball Association Seattle Storm Minnesota Lynx 90 66 1.21 4.19 41/49 35/31 http://data.nowgoal.group/OddsCompBasket/385884.html
6 Friendly Competition Labas Pasauli LT Balduasenaras 85 78 52/33 31/47 http://data.nowgoal.group/OddsCompBasket/387769.html
7 Friendly Competition BC Vikings Nemuno Banga KK 66 72 29/37 30/42 http://data.nowgoal.group/OddsCompBasket/387771.html
8 Friendly Competition NRG Kiev Hizhaki 51 76 31/20 28/48 http://data.nowgoal.group/OddsCompBasket/387766.html
9 Friendly Competition Finland Estonia 97 76 2.77 1.40 48/49 29/47 http://data.nowgoal.group/OddsCompBasket/387740.html
10 Friendly Competition Synkarb Sk nemenchine 82 79 37/45 38/41 http://data.nowgoal.group/OddsCompBasket/387770.html
等等……
并保存一个看起来像这样的Res.csv:
请求
试试这个(请求):
import pandas as pd
from bs4 import BeautifulSoup
import requests
res = []
url = 'http://www.nowgoal.group/GetNbaWithTimeZone.aspx?date=2020-07-29&timezone=2&kind=0&t=1596143185000'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
items = soup.find_all('h')
for item in items:
values = item.text.split('^')
res.append([f'{values[1]}', f'{values[8]}', f'{values[10]}', f'{values[11]}', f'{values[12]}'])
df = pd.DataFrame(res, columns=['competition', 'home', 'away', 'home score', 'away score'])
print(df.to_string())
df.to_csv('Res.csv')
打印:
competition home away home score away score
0 NBA Portland Trail Blazers Oklahoma City Thunder 120 131
1 NBA Houston Rockets Boston Celtics 137 112
2 NBA Philadelphia 76ers Dallas Mavericks 115 118
3 WNBA Connecticut Sun Washington Mystics 89 94
4 WNBA Chicago Sky Los Angeles Sparks 96 78
5 WNBA Seattle Storm Minnesota Lynx 90 66
6 FC Labas Pasauli LT Balduasenaras 85 78
7 FC BC Vikings Nemuno Banga KK 66 72
8 FC NRG Kiev Hizhaki 51 76
并保存如下所示的 Res.csv:
如果您不想要索引列,您可以简单地将index=False 添加到df.to_csv('Res.csv'),所以它看起来像这样df.to_csv('Res.csv', index=False)
注意 selenium:您需要 selenium 和 geckodriver 并且在此代码中 geckodriver 设置为从 c:/program/geckodriver.exe 导入
selenium 版本较慢,但无需通过devtools 获取和查找XML 文件