【问题标题】:How to calculate a win streak in Python/Pandas如何在 Python/Pandas 中计算连胜
【发布时间】:2019-06-05 00:24:50
【问题描述】:

我正在尝试计算一场比赛的连胜或连败。我的目标是根据这些连胜因素或最近的记录做出投注决定。我是 Python 和 Pandas(以及一般编程)的新手,因此欢迎对代码的作用进行任何详细解释。

这是我的数据

    Season               Game Date                   Game Index  Away Team               Away Score  Home Team             Home Score  Winner                Loser
 0  2014 Regular Season  Saturday, March 22, 2014    2014032201  Los Angeles Dodgers              3  Arizona D'Backs                1  Los Angeles Dodgers   Arizona D'Backs
 1  2014 Regular Season  Sunday, March 23, 2014      2014032301  Los Angeles Dodgers              7  Arizona D'Backs                5  Los Angeles Dodgers   Arizona D'Backs
 2  2014 Regular Season  Sunday, March 30, 2014      2014033001  Los Angeles Dodgers              1  San Diego Padres               3  San Diego Padres      Los Angeles Dodgers
 3  2014 Regular Season  Monday, March 31, 2014      2014033101  Seattle Mariners                10  Los Angeles Angels             3  Seattle Mariners      Los Angeles Angels
 4  2014 Regular Season  Monday, March 31, 2014      2014033102  San Francisco Giants             9  Arizona D'Backs                8  San Francisco Giants  Arizona D'Backs
 5  2014 Regular Season  Monday, March 31, 2014      2014033103  Boston Red Sox                   1  Baltimore Orioles              2  Baltimore Orioles     Boston Red Sox
 6  2014 Regular Season  Monday, March 31, 2014      2014033104  Minnesota Twins                  3  Chicago White Sox              5  Chicago White Sox     Minnesota Twins
 7  2014 Regular Season  Monday, March 31, 2014      2014033105  St. Louis Cardinals              1  Cincinnati Reds                0  St. Louis Cardinals   Cincinnati Reds
 8  2014 Regular Season  Monday, March 31, 2014      2014033106  Kansas City Royals               3  Detroit Tigers                 4  Detroit Tigers        Kansas City Royals
 9  2014 Regular Season  Monday, March 31, 2014      2014033107  Colorado Rockies                 1  Miami Marlins                 10  Miami Marlins         Colorado Rockies

字典如下:

{'Away Score': {0: 3, 1: 7, 2: 1, 3: 10, 4: 9},
 'Away Team': {0: 'Los Angeles Dodgers',
  1: 'Los Angeles Dodgers',
  2: 'Los Angeles Dodgers',
  3: 'Seattle Mariners',
  4: 'San Francisco Giants'},
 'Game Date': {0: 'Saturday, March 22, 2014',
  1: 'Sunday, March 23, 2014',
  2: 'Sunday, March 30, 2014',
  3: 'Monday, March 31, 2014',
  4: 'Monday, March 31, 2014'},
 'Game Index': {0: 2014032201,
  1: 2014032301,
  2: 2014033001,
  3: 2014033101,
  4: 2014033102},
 'Home Score': {0: 1, 1: 5, 2: 3, 3: 3, 4: 8},
 'Home Team': {0: "Arizona D'Backs",
  1: "Arizona D'Backs",
  2: 'San Diego Padres',
  3: 'Los Angeles Angels',
  4: "Arizona D'Backs"},
 'Loser': {0: "Arizona D'Backs",
  1: "Arizona D'Backs",
  2: 'Los Angeles Dodgers',
  3: 'Los Angeles Angels',
  4: "Arizona D'Backs"},
 'Season': {0: '2014 Regular Season',
  1: '2014 Regular Season',
  2: '2014 Regular Season',
  3: '2014 Regular Season',
  4: '2014 Regular Season'},
 'Winner': {0: 'Los Angeles Dodgers',
  1: 'Los Angeles Dodgers',
  2: 'San Diego Padres',
  3: 'Seattle Mariners',
  4: 'San Francisco Giants'}}

我尝试循环浏览赛季和球队,然后根据 [this]:https://github.com/nhcamp/EPL-Betting/blob/master/EPL%20Match%20Results%20DF.ipynbgithub 项目创建连续计数。

我在构建循环的早期就遇到了关键错误,我无法识别数据

game_table = pd.read_csv('MLB_Scores_2014_2018.csv')

# Get Team List
team_list = game_table['Away Team'].unique()

# Get Season List
season_list = game_table['Season'].unique()

#Defining "chunks" to append gamedata to the total dataframe
chunks = []

for season in season_list:
    # Looping through seasons. Streaks reset for each season
    season_games = game_table[game_table['Season'] == season]

    for team in team_list:
        # Looping through teams
        season_team_games = season_games[(season_games['Away Team'] == team | season_games['Home Team'] == team)]

        #Setting streak list and streak counter values
        streak_list = []
        streak = 0

        # Looping through each game
        for game in season_team_games.iterrow():
            # Check if team is a winner, and up the streak
            if game_table['Winner'] == team:
                streak_list.append(streak)
                streak += 1
            # If not the winner, append streak and set to zero
            elif game_table['Winner'] != team:
                streak_list.append(streak)
                streak = 0
            # Just in case something wierd happens with the scores
            else:
                streak_list.append(streak)
        game_table['Streak'] = streak_list
        chunk_list.append(game_table)

这就是我失去它的地方。如果每支球队都是主队或客队,我该如何分别追加?有没有更好的方法来显示这些数据?

一般来说,我想在每场比赛中为每支球队添加连胜和/或连败。标头看起来像这样:

|季节 |比赛日期 |游戏索引 |客队 |客场得分 |主队 |首页 评分 |优胜者 |失败者 |客场连胜 |客场连败 |主场连胜 |主场连败 |

编辑:此错误消息已解决

我在创建数据框“season_team_games”时也遇到了错误。”

TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

【问题讨论】:

  • 你能添加你的示例数据并输出为文本吗? 5 行(两者都行)我认为这是groupby cumsumtransform 的情况。查看您的数据可能需要先成为unstacked,但我认为如果您提供文本数据供其他人使用,其他人将能够轻松提供帮助。
  • 谢谢 - 我已经添加了示例数据。输出数据有点困难,因为我实际上无法完成代码,并且对于 10 条记录的小数据样本,它有很多 1 和 0,而且不是很好。
  • 太好了,我的笔记本电脑正在维修,所以我现在无法回答,但如果明天早上之前没有回答,我会在我的电脑上完成,祝你好运,希望其他人能选择这个起来!
  • 您可以打印df.head(5).to_dict() 并将其粘贴到问题中吗?使用您的代码示例时遇到问题
  • 当然——我刚刚更新了这个问题。看起来这只是球队和得分数据。这就是你要找的东西吗?

标签: python pandas


【解决方案1】:

您看到的错误来自语句

season_team_games = season_games[(season_games['Away Team'] == team | season_games['Home Team'] == team)]

当您添加两个布尔条件时,您需要用括号将它们分开。这是因为| 运算符优先于== 运算符。所以这应该变成:

season_team_games = season_games[(season_games['Away Team'] == team) | (season_games['Home Team'] == team)]

我知道这个问题比这个错误更多,但正如评论中提到的,一旦你提供了一些基于文本的数据,它可能会更容易提供帮助

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-10-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多