【发布时间】:2019-04-15 15:04:33
【问题描述】:
我是美汤的新手,我正在尝试从this 网站提取数据。
import bs4
import requests as re
import pandas as pd
class roto_PlayerStats:
class roto_Player:
def __init__(self):
self.name = ""
self.team = ""
self.pos = ""
self.salary = 0
self.minutes = 0
self.reb = 0
self.ast = 0
self.stl = 0
self.blk = 0
self.to = 0
self.pts = 0
self.usg = 0
self.fpts = 0
def __init__(self):
self.players =[]
def load-data(self):
response = re.get("https://rotogrinders.com/game-stats/nba-player?site=draftkings&range=season")
soup = BeautifulSoup(response.content, "html.parser")
for x in soup.find_all('"id'):
#code to load the individual data?
来自 的数据位于一个结构如下的数组中。这是加载单个玩家数据的正确方法吗?
$(document).ready(function() {
var data = [{"id":915,"player":"J.R. Smith","team":"CLE","pos":"SHW","salary":null,"opp":"N\/A","gp":8,"min":"150.00","fgm":18,"fga":51,"ftm":8,"fta":8,"3pm":9,"3pa":27,"reb":13,"ast":13,"stl":10,"blk":2,"to":9,"pts":53,"usg":"18.08","pace":64,"fpts":"115.10"}, {}...]
来自同一域的不同网页的新功能
class grinder_Team:
def __init__(self):
self.name = ""
self.gp = 0
self.minutes = 0
self.reb = 0
self.ast = 0
self.stl = 0
self.blk = 0
self.to = 0
self.pts = 0
self.pace = 0
self.fpts = 0
class grinder_TeamStats:
def __init__(self):
self.teams = []
response = requests.get("https://rotogrinders.com/team-stats/nba-earned?site=draftkings&range=season")
soup = BeautifulSoup(response.content, 'html.parser')
proj_stats = soup.find('section', {'class': 'pag bdy'})
script = proj_stats.find('script')
data = re.search(r"data\s*=\s*(.*);", script.text).group(1)
stats = json.loads(data)
for team in stats:
# do x
print ("finished")
我在这一行遇到错误
data = re.search(r"data\s*=\s*(.*);", script.text).group(1)
说
AttributeError: 'NoneType' object has no attribute 'group'
我不确定为什么会这样,因为我打印了两个链接的脚本和 script.text 变量,它们的输出非常相似。
【问题讨论】:
-
您在脚本中的 JSON 字符串中观察到的值是否与页面上的内容匹配?
标签: python url web-scraping beautifulsoup