喜欢那些体育 reference.com 网站。 Trenton 的解决方案是完美的,因此不要更改已接受的答案,而只是想为可能的投手提供这个替代数据源,以防您感兴趣。
看起来 mlb.com 有一个公开可用的 api 来提取该信息(我假设这可能是棒球参考填充他们可能的投手页面的地方)。但我喜欢这一点的是,您可以获得更多返回分析的数据,它使您可以选择获得更广泛的日期范围来获取历史数据,并可能提前 2 或 3 天(以及)。所以也看看这段代码,玩它,练习它。
但这可能会让您开始第一次使用机器学习。
PS:如果您想知道strikeZoneBottom 和strikeZoneTop 的含义,请告诉我,如果您甚至费心查看这些数据。我一直无法弄清楚这些是什么意思。
我也想知道是否有关于球场的数据。就像投手的统计数据一样,有飞球:地球的比例。如果有关于球场的数据,例如,如果您在一个产生大量本垒打的场地中有飞球投手,那么您可能会在飞球传播不那么远的球场中看到同一个投手的不同情况,或者体育场有更深的围栏(基本上本垒打变成警告轨道飞出,反之亦然)??
代码:
import requests
import pandas as pd
from datetime import datetime, timedelta
url = 'https://statsapi.mlb.com/api/v1/schedule'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
yesterday = datetime.strftime(datetime.now() - timedelta(1), '%Y-%m-%d')
today = datetime.strftime(datetime.now(), '%Y-%m-%d')
tomorrow = datetime.strftime(datetime.now() + timedelta(1), '%Y-%m-%d')
#To get 7 days earlier; notice the minus sign
#pastDate = datetime.strftime(datetime.now() - timedelta(7), '%Y-%m-%d')
#To get 3 days later; notice the plus sign
#futureDate = datetime.strftime(datetime.now() + timedelta(3), '%Y-%m-%d')
#hydrate parameter is to get back certain data elements. Not sure how to alter it exactly yet, would have to play around
#But without hydrate, it doesn't return probable pitchers
payload = {
'sportId': '1',
'startDate': today, #<-- Change these to get a wider range of games (to also get historical stats for machine learning)
'endDate': today, #<-- Change these to get a wider range of games (to possible probable pitchers for next few days. just need to adjust timedelta above)
'hydrate': 'team(leaders(showOnPreview(leaderCategories=[homeRuns,runsBattedIn,battingAverage],statGroup=[pitching,hitting]))),linescore(matchup,runners),flags,liveLookin,review,broadcasts(all),venue(location),decisions,person,probablePitcher,stats,homeRuns,previousPlay,game(content(media(featured,epg),summary),tickets),seriesStatus(useOverride=true)'}
jsonData = requests.get(url, headers=headers, params=payload).json()
dates = jsonData['dates']
rows = []
for date in dates:
games = date['games']
for game in games:
dayNight = game['dayNight']
gameDate = game['gameDate']
city = game['venue']['location']['city']
venue = game['venue']['name']
teams = game['teams']
for k, v in teams.items():
row = {}
row.update({'dayNight':dayNight,
'gameDate':gameDate,
'city':city,
'venue':venue})
homeAway = k
teamName = v['team']['name']
if 'probablePitcher' not in v.keys():
row.update({'homeAway':homeAway,
'teamName':teamName})
rows.append(row)
else:
probablePitcher = v['probablePitcher']
fullName = probablePitcher['fullName']
pitchHand = probablePitcher['pitchHand']['code']
strikeZoneBottom = probablePitcher['strikeZoneBottom']
strikeZoneTop = probablePitcher['strikeZoneTop']
row.update({'homeAway':homeAway,
'teamName':teamName,
'probablePitcher':fullName,
'pitchHand':pitchHand,
'strikeZoneBottom':strikeZoneBottom,
'strikeZoneTop':strikeZoneTop})
stats = probablePitcher['stats']
for stat in stats:
if stat['type']['displayName'] == 'statsSingleSeason' and stat['group']['displayName'] == 'pitching':
playerStats = stat['stats']
row.update(playerStats)
rows.append(row)
df = pd.DataFrame(rows)
输出:前 10 行
print (df.head(10).to_string())
airOuts atBats balks baseOnBalls blownSaves catchersInterference caughtStealing city completeGames dayNight doubles earnedRuns era gameDate gamesFinished gamesPitched gamesPlayed gamesStarted groundOuts groundOutsToAirouts hitBatsmen hitByPitch hits hitsPer9Inn holds homeAway homeRuns homeRunsPer9 inheritedRunners inheritedRunnersScored inningsPitched intentionalWalks losses obp outs pickoffs pitchHand probablePitcher rbi runs runsScoredPer9 sacBunts sacFlies saveOpportunities saves shutouts stolenBasePercentage stolenBases strikeOuts strikeZoneBottom strikeZoneTop strikeoutWalkRatio strikeoutsPer9Inn teamName triples venue walksPer9Inn whip wildPitches winPercentage wins
0 15.0 44.0 0.0 9.0 0.0 0.0 0.0 Baltimore 0.0 day 2.0 8.0 6.00 2020-08-19T17:05:00Z 0.0 3.0 3.0 3.0 9.0 0.60 0.0 0.0 10.0 7.50 0.0 away 3.0 2.25 0.0 0.0 12.0 0.0 1.0 .358 36.0 0.0 R Tanner Roark 0.0 8.0 6.00 0.0 0.0 0.0 0.0 0.0 1.000 1.0 10.0 1.589 3.467 1.11 7.50 Toronto Blue Jays 0.0 Oriole Park at Camden Yards 6.75 1.58 0.0 .500 1.0
1 18.0 74.0 0.0 3.0 0.0 0.0 0.0 Baltimore 0.0 day 5.0 8.0 4.00 2020-08-19T17:05:00Z 0.0 4.0 4.0 4.0 18.0 1.00 1.0 1.0 22.0 11.00 0.0 home 1.0 0.50 0.0 0.0 18.0 0.0 2.0 .329 54.0 1.0 L Tommy Milone 0.0 11.0 5.50 1.0 1.0 0.0 0.0 0.0 1.000 1.0 18.0 1.535 3.371 6.00 9.00 Baltimore Orioles 1.0 Oriole Park at Camden Yards 1.50 1.39 1.0 .333 1.0
2 14.0 59.0 0.0 2.0 0.0 0.0 0.0 Boston 0.0 day 3.0 7.0 4.02 2020-08-19T17:35:00Z 0.0 3.0 3.0 3.0 14.0 1.00 0.0 0.0 17.0 9.77 0.0 away 2.0 1.15 0.0 0.0 15.2 0.0 2.0 .311 47.0 0.0 R Jake Arrieta 0.0 7.0 4.02 0.0 0.0 0.0 0.0 0.0 .--- 0.0 14.0 1.627 3.549 7.00 8.04 Philadelphia Phillies 0.0 Fenway Park 1.15 1.21 2.0 .333 1.0
3 2.0 14.0 1.0 3.0 0.0 0.0 0.0 Boston 0.0 day 1.0 5.0 22.50 2020-08-19T17:35:00Z 0.0 1.0 1.0 1.0 1.0 0.50 0.0 0.0 7.0 31.50 0.0 home 2.0 9.00 0.0 0.0 2.0 0.0 1.0 .588 6.0 0.0 L Kyle Hart 0.0 7.0 31.50 0.0 0.0 0.0 0.0 0.0 .--- 0.0 4.0 1.681 3.575 1.33 18.00 Boston Red Sox 0.0 Fenway Park 13.50 5.00 0.0 .000 0.0
4 8.0 27.0 0.0 0.0 0.0 0.0 0.0 Chicago 0.0 day 0.0 2.0 2.57 2020-08-19T18:20:00Z 0.0 1.0 1.0 1.0 7.0 0.88 0.0 0.0 6.0 7.71 0.0 away 0.0 0.00 0.0 0.0 7.0 0.0 0.0 .222 21.0 0.0 R Jack Flaherty 0.0 2.0 2.57 0.0 0.0 0.0 0.0 0.0 .--- 0.0 6.0 1.627 3.549 -.-- 7.71 St. Louis Cardinals 0.0 Wrigley Field 0.00 0.86 0.0 1.000 1.0
5 13.0 65.0 0.0 6.0 0.0 0.0 1.0 Chicago 0.0 day 2.0 6.0 2.84 2020-08-19T18:20:00Z 0.0 3.0 3.0 3.0 28.0 2.15 1.0 1.0 10.0 4.74 0.0 home 2.0 0.95 0.0 0.0 19.0 0.0 1.0 .236 57.0 0.0 R Alec Mills 0.0 6.0 2.84 0.0 0.0 0.0 0.0 0.0 .000 0.0 14.0 1.627 3.549 2.33 6.63 Chicago Cubs 0.0 Wrigley Field 2.84 0.84 0.0 .667 2.0
6 NaN NaN NaN NaN NaN NaN NaN Chicago NaN night NaN NaN NaN 2020-08-19T03:33:00Z NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN away NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Chicago Cubs NaN Wrigley Field NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN Chicago NaN night NaN NaN NaN 2020-08-19T03:33:00Z NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN home NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN St. Louis Cardinals NaN Wrigley Field NaN NaN NaN NaN NaN
8 13.0 92.0 0.0 8.0 0.0 0.0 1.0 Kansas City 0.0 day 6.0 10.0 3.91 2020-08-19T21:05:00Z 0.0 4.0 4.0 4.0 24.0 1.85 0.0 0.0 25.0 9.78 0.0 away 1.0 0.39 0.0 0.0 23.0 0.0 2.0 .327 69.0 0.0 R Luis Castillo 0.0 12.0 4.70 0.0 1.0 0.0 0.0 0.0 .000 0.0 31.0 1.589 3.467 3.88 12.13 Cincinnati Reds 1.0 Kauffman Stadium 3.13 1.43 0.0 .000 0.0
9 10.0 36.0 0.0 5.0 0.0 0.0 0.0 Kansas City 0.0 day 0.0 0.0 0.00 2020-08-19T21:05:00Z 0.0 2.0 2.0 2.0 11.0 1.10 1.0 1.0 5.0 4.09 0.0 home 0.0 0.00 0.0 0.0 11.0 0.0 0.0 .262 33.0 0.0 R Brad Keller 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 .--- 0.0 10.0 1.681 3.575 2.00 8.18 Kansas City Royals 0.0 Kauffman Stadium 4.09 0.91 0.0 1.000 2.0