【问题标题】:How to apply function to date indexed DataFrame如何将函数应用于日期索引的 DataFrame
【发布时间】:2012-12-07 03:27:46
【问题描述】:

我在使用带有日期索引的 DataFrame 时遇到很多问题。

from pandas import DataFrame, date_range
# Create a dataframe with dates as your index
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
idx = date_range('1/1/2012', periods=10, freq='MS')
df = DataFrame(data, index=idx, columns=['Revenue'])
df['State'] = ['NY', 'NY', 'NY', 'NY', 'FL', 'FL', 'GA', 'GA', 'FL', 'FL'] 

In [6]: df
Out[6]: 
       Revenue   State
2012-01-01   1      NY
2012-02-01   2      NY
2012-03-01   3      NY
2012-04-01   4      NY
2012-05-01   5      FL
2012-06-01   6      FL
2012-07-01   7      GA
2012-08-01   8      GA
2012-09-01   9      FL
2012-10-01   10     FL

我正在尝试使用组平均值添加一个名为 'Mean' 的附加列:

我试过了,但它不起作用:

df2 = df
df2['Mean'] = df.groupby(['State'])['Revenue'].apply(lambda x: mean(x))

In [9]: df2.head(10)
Out[9]:
       Revenue    State    Mean
2012-01-01   1       NY     NaN
2012-02-01   2       NY     NaN
2012-03-01   3       NY     NaN
2012-04-01   4       NY     NaN
2012-05-01   5       FL     NaN
2012-06-01   6       FL     NaN
2012-07-01   7       GA     NaN
2012-08-01   8       GA     NaN
2012-09-01   9       FL     NaN
2012-10-01   10      FL     NaN

但我想得到:

       Revenue    State    Mean
2012-01-01   1       NY     2.5
2012-02-01   2       NY     2.5
2012-03-01   3       NY     2.5
2012-04-01   4       NY     2.5
2012-05-01   5       FL     7.5
2012-06-01   6       FL     7.5
2012-07-01   7       GA     7.5
2012-08-01   8       GA     7.5
2012-09-01   9       FL     7.5
2012-10-01   10      FL     7.5

我怎样才能得到这个DataFrame?

【问题讨论】:

    标签: indexing group-by pandas


    【解决方案1】:

    使用joinmerge 也可以:

    In [68]: revs = df.groupby('State').Revenue.mean()
    
    In [69]: revs.name = 'Mean Revenue'
    
    In [70]: df.join(revs, on='State')
    Out[70]: 
                Revenue State  Mean Revenue
    2012-01-01        1    NY           2.5
    2012-02-01        2    NY           2.5
    2012-03-01        3    NY           2.5
    2012-04-01        4    NY           2.5
    2012-05-01        5    FL           7.5
    2012-06-01        6    FL           7.5
    2012-07-01        7    GA           7.5
    2012-08-01        8    GA           7.5
    2012-09-01        9    FL           7.5
    2012-10-01       10    FL           7.5
    

    【讨论】:

      【解决方案2】:

      你几乎拥有它!首先创建 groupby 对象:

      means = df.groupby('State').mean()
      
      In [5]: means
      Out[5]: 
             Revenue
      State         
      FL         7.5
      GA         7.5
      NY         2.5
      

      然后apply这个到DataFrame中的每个状态:

      df['mean'] = df['State'].apply(lambda x: means.ix[x]['Revenue'])
      
      In [7]: df
      Out[7]: 
                  Revenue State  mean
      2012-01-01        1    NY   2.5
      2012-02-01        2    NY   2.5
      2012-03-01        3    NY   2.5
      2012-04-01        4    NY   2.5
      2012-05-01        5    FL   7.5
      2012-06-01        6    FL   7.5
      2012-07-01        7    GA   7.5
      2012-08-01        8    GA   7.5
      2012-09-01        9    FL   7.5
      2012-10-01       10    FL   7.5
      

      【讨论】:

        猜你喜欢
        • 2020-05-11
        • 2016-12-04
        • 2013-11-18
        • 2013-11-30
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-04-08
        相关资源
        最近更新 更多