【问题标题】:how to separate one DataFrame into two small ones如何将一个 DataFrame 分成两个小的 DataFrame
【发布时间】:2013-03-03 02:46:01
【问题描述】:

我有一个很大的DataFrame,如下:

            count   mean  median    min    max   std
datet                                               
2001-05-16     17    NaN     NaN    NaN    NaN   NaN
2001-05-17     24   8.28    8.27   8.15   8.46  0.09
2001-05-18     24   8.41    8.31   8.18   8.85  0.19
2001-05-19     24  10.44   10.64   9.03  10.98  0.60
2001-05-20     24  10.53   10.56   9.98  10.92  0.28
2001-05-21     24  10.28   10.31   9.90  10.66  0.23
2001-05-22     24  10.40   10.42  10.17  10.67  0.17
2001-05-23     24  10.04   10.03   9.87  10.17  0.08
2001-05-24     24   9.63    9.66   9.41   9.88  0.15
2001-05-25     24   9.21    9.22   9.01   9.41  0.11

如何根据日期“2001-05-20”之前或之后将这个DataFrame 分成两个小的?如下:

df1:
         count   mean  median    min    max   std
datet                                               
2001-05-16     17    NaN     NaN    NaN    NaN   NaN
2001-05-17     24   8.28    8.27   8.15   8.46  0.09
2001-05-18     24   8.41    8.31   8.18   8.85  0.19
2001-05-19     24  10.44   10.64   9.03  10.98  0.60
2001-05-20     24  10.53   10.56   9.98  10.92  0.28

df2:
     count   mean  median    min    max   std
datet                                               
2001-05-21     24  10.28   10.31   9.90  10.66  0.23
2001-05-22     24  10.40   10.42  10.17  10.67  0.17
2001-05-23     24  10.04   10.03   9.87  10.17  0.08
2001-05-24     24   9.63    9.66   9.41   9.88  0.15
2001-05-25     24   9.21    9.22   9.01   9.41  0.11

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    对于单个拆分前/拆分后,我认为按布尔标准分组是最直接的方法。

    In [1]: df = DataFrame(np.random.randn(10),
                           index=pd.date_range('2001-05-16', '2001-05-25'))
    
    In [2]: grouper = df.groupby(df.index < pd.Timestamp('2001-05-21'))
    
    In [3]: before, after = grouper.get_group(True), grouper.get_group(False)
    
    In [4]: before
    Out[4]: 
                   0
    2001-05-16  2.560516
    2001-05-17 -2.207314
    2001-05-18  0.646882
    2001-05-19  0.660611
    2001-05-20  0.437303
    

    after 也同样出现。谁能改进我的In [3]

    【讨论】:

      【解决方案2】:

      0.11-dev(.ix 将等效地工作)

      In [16]: df.loc[:'20010520']
      Out[16]: 
                         0
      2001-05-16  0.105445
      2001-05-17  1.660771
      2001-05-18  0.485668
      2001-05-19 -0.102616
      2001-05-20 -0.228228
      
      In [17]: df.loc['20010521':]
      Out[17]: 
                         0
      2001-05-21 -0.024324
      2001-05-22 -1.004362
      2001-05-23  2.342225
      2001-05-24  1.124695
      2001-05-25 -0.291302
      

      或(ix 也可以在这里工作,这只是更明确)

       In [27]: i = df.index.get_loc('20010520')
      
      In [28]: df.iloc[:i+1]
      Out[28]: 
                         0
      2001-05-16  0.105445
      2001-05-17  1.660771
      2001-05-18  0.485668
      2001-05-19 -0.102616
      2001-05-20 -0.228228
      
      In [29]: df.iloc[i+1:]
      Out[29]: 
                         0
      2001-05-21 -0.024324
      2001-05-22 -1.004362
      2001-05-23  2.342225
      2001-05-24  1.124695
      2001-05-25 -0.291302
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-07-10
        • 2017-06-08
        • 1970-01-01
        • 2021-12-22
        • 1970-01-01
        • 1970-01-01
        • 2020-03-02
        • 1970-01-01
        相关资源
        最近更新 更多