网络抓取返回“无”答案

【问题标题】：web scraping returns 'None'网络抓取返回“无”
【发布时间】：2021-12-06 15:04:11
【问题描述】：

我是 python 新手，我正在尝试构建一个网络抓取算法。

我正在尝试抓取“href”网址：

我的代码：

URL = 'https://www.rotowire.com/basketball/team.php?team=UTA'

page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

service = Service(ChromeDriverManager().install())

for link in soup.find_all({"aria-colindex" : "3"}):

    print(link.get('href'))

driver = webdriver.Chrome(service = service)

但这没有返回任何内容。我也试过 {'style' : "width: 96px; left: 190px; top: 0px;"} 而不是 {"aria-colindex" : "3"}，但这也返回 'None'。不知道我做错了什么，所以任何帮助将不胜感激:)

【问题讨论】：

尝试用驱动打开站点，而不是请求，然后使用beautifulsoup
请不要图片，只需复制粘贴源代码！谢谢
soup.find_all({"aria-colindex" : "3"}) 将找到div 标签，它没有href 属性，因此link.get('href') 将返回None 是有意义的。您需要查看每个div 的a 子级。
好的，但是我该怎么做呢？感谢您的回复：）

标签： python selenium-webdriver web-scraping beautifulsoup

【解决方案1】：

数据是从api 动态加载的。直接从 api 检索链接更容易。这是pandas 的实现：

import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_json('https://www.rotowire.com/basketball/tables/team-schedule.php?team=UTA')
df['url'] = df['score'].apply(lambda x: BeautifulSoup(x).find('a')['href'])
df.to_csv('output.csv') #export to csv

【讨论】：