【发布时间】:2021-01-23 06:21:35
【问题描述】:
我正在尝试创建一个 NHL 投注系统,并且需要每天从网络上抓取实时数据。使用 IMPORTHTML() 函数无法使用我需要抓取的表格。我正在尝试使用 python,但没有为初学者找到好的教程。我需要帮助
>>> from bs4 import BeautifulSoup
>>> import requests
>>> from selenium import webdriver
>>> import pandas as ps
>>> PATH = "C:/webdrivers/chromedriver.exe"
>>> table_name = "table_container"
>>> csv_name = 'nhl_season_stats.csv'
>>> URL = "https://www.hockey-reference.com/leagues/NHL_2021.html"
>>> def get_nhl_stats(URL):
... driver = webdriver.Chrome(PATH)
... driver.get(URL)
... soup = BeautifulSoup(driver.page_source,'html')
... driver.quit()
... tables = soup.find_all('table',{"id":[table_name]})
... for table in tables:
... tab_name = table['id']
... tab_data = [[cell.text for cell in row.find_all(["th","td"])]
... for row in table.find_all("tr")]
... df = pd.DataFrame(tab_data)
... df.columns = df.iloc[0,:]
... df.drop(index=0,inplace= True)
... df.to_csv(csv_name, index = False)
... print(tab_name)
... print(df)
...
>>> get_nhl_stats(URL)
我不断收到此错误:
DevTools listening on ws://127.0.0.1:59353/devtools/browser/2ad39b85-94a0-
4f64-a738-994c69f7373c
[10572:2256:0123/020420.281:ERROR:device_event_log_impl.cc(211)]
[02:04:20.281] USB: usb_device_handle_win.cc:1049 Failed to read descriptor
from node connection: A device attached to the system is not functioning.
(0x1F)
[10572:2256:0123/020420.283:ERROR:device_event_log_impl.cc(211)]
[02:04:20.283] USB: usb_device_handle_win.cc:1049 Failed to read descriptor
from node connection: A device attached to the system is not functioning.
(0x1F)
【问题讨论】:
-
请提供您已经尝试过的代码。
-
@goalie1998 好的,我做到了
-
@Mason 只是好奇,但为什么要使用 Selenium?您可以简单地使用 1)
requests和beautifulsoup获取该数据;或 2)pandas,或 3) 使用 nhl.com 上的 api。所有这些选项都比模拟打开浏览器然后解析数据要快。 -
@goalie1998 我从 YouTube 上的某个人那里得到了剧本,我真的不知道我在做什么,我只是想复制他
标签: python web-scraping google-sheets