【问题标题】：Creating a new column which finds the difference in dates based on a conditional创建一个新列，根据条件查找日期差异
【发布时间】：2019-05-17 03:36:26
【问题描述】：

我有以下数据框：

df=

Date     Team1     Team2     
6/1      Boston    New York  
6/13     New York  Chicago   
6/27     Boston    New York  
6/28     Chicago   Boston

我想创建一个新列，根据 team1 的条件查找日期差异。例如）当芝加哥是 Team1 时，我想知道他们上次比赛后的天数，无论他们在上一场比赛中是 Team1 还是 Team2。

df=

Date     Team1     Team2      Days since Team1 played
6/1      Boston    New York   0
6/13     New York  Chicago    12
6/27     Boston    New York   26
6/28     Chicago   Boston     15

【问题讨论】：

标签： python pandas dataframe conditional multiple-columns

【解决方案1】：

您的预期输出很接近，但我会创建一个多索引

使用melt 和diff 然后pivot

# melt to get Teams as one columns
melt = df.melt('Date').sort_values('Date')

# groupby and find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()

# pivot to go back to the original df format
melt.pivot('Date','variable') 

                  value              diff
variable     Team1    Team2     Team1     Team2
      Date              
2018-06-01  Boston   New York    NaT       NaT
2018-06-13  New York Chicago     12 days   NaT
2018-06-27  Boston   New York    26 days   14 days
2018-06-28  Chicago  Boston      15 days   1 days

更新

根据您的评论，这里是更新：

# assume this df
    Date         Team1   Team2
0   2018-06-01  Boston    New York
1   2018-06-13  New York  Chicago
2   2018-06-27  Boston    New York
3   2018-06-28  Chicago   Boston
4   2018-06-28  New York  Detroit

代码：

# melt df (same as above example)
melt = df.melt('Date').sort_values('Date')

# find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()

# use pivot_table not pivot
piv = melt.pivot_table(index=['Date', 'diff'], columns='variable', values='value', aggfunc=lambda x:x)

# reset index and dropna from team 1
piv.reset_index(level=1, inplace=True)
piv = piv[~piv['Team1'].isna()]

# merge your original df and your new one together
pd.merge(df, piv[piv.columns[:-1]], on=['Date','Team1'], how='outer').fillna(0)

         Date   Team1     Team2     diff
0   2018-06-01  Boston    New York  0 days
1   2018-06-13  New York  Chicago   12 days
2   2018-06-27  Boston    New York  26 days
3   2018-06-28  Chicago   Boston    15 days
4   2018-06-28  New York  Detroit   1 days

请注意，这次的差异只是与 Team1 上次比赛时的差异

【讨论】：

你是对的。我已经更正了原帖中的错字。
不幸的是，当应用于更大的数据帧时，我收到了这个错误。 ValueError: Index contains duplicate entries, cannot reshape... 此错误特别发生在您上面列出的第三步也是最后一步。
@RyanG73 该错误表明日期不是唯一的。您的数据是否包含多个在同一日期比赛的球队？
是的，确实如此。当晚上有几场比赛时，它包括一个完整的赛季比赛
@RyanG73 查看更新