【问题标题】:How to get last 5 months of data of a Pandas DataFrame?如何获取 Pandas DataFrame 过去 5 个月的数据?
【发布时间】:2017-09-26 11:42:56
【问题描述】:

我有一个 Pandas 数据框,它是通过使用 QPython 从 KDB 数据库获取数据而生成的。

首先,Date 列作为一个奇怪的 dtype 返回:dtype('<M8[ns]')

df = conn.sync("select Date, Open, High, Low, Close from stocktable", pandas=True)
df["Date"].dtype
# dtype('<M8[ns]')

但是,当我检查列的内容时,底行将 dtype 显示为日期时间。

0      2017-04-17
1      2017-04-13
2      2017-04-12
3      2017-04-11
4      2017-04-10
5      2017-04-07
6      2017-04-06
7      2017-04-05
8      2017-04-04
9      2017-04-03
10     2017-03-31
11     2017-03-30
          ...    

3180   2004-08-27
3181   2004-08-26
3182   2004-08-25
3183   2004-08-24
3184   2004-08-23
3185   2004-08-20
3186   2004-08-19
Name: Date, dtype: datetime64[ns]

另外,方法last() 无法正常工作。我要求提供最近 5 个月的数据,但所有数据都返回了。

# Expected to only return last 5 months of data, but returns it all.
df.set_index("Date").last("5M")

如何获取此 DataFrame 的最后一行?

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    解决了。问题是 KDB 返回的数据是按 DESC 顺序排序的,这混淆了方法 last()

    解决方案是在查询中添加排序子句(在 Q 语言中,它带有 backtick followed by the keyword xasc

    df = conn.sync("`Date xasc select Date, Open, High, Low, Close from stocktable", pandas=True) \
         .last("5M")
    

    或者,对 Pandas 数据框本身的数据进行排序。

    df_sorted = stocktable.dataframe() \
        .sort_values(by="Date",ascending=True) \
        .set_index("Date")
        .last("5M")
    

    【讨论】:

      【解决方案2】:

      对我来说效果很好:

      rng = pd.date_range('2017-04-03', periods=10, freq='20D')
      df = pd.DataFrame({'Date': rng, 'a': range(10)})  
      print (df)
              Date  a
      0 2017-04-03  0
      1 2017-04-23  1
      2 2017-05-13  2
      3 2017-06-02  3
      4 2017-06-22  4
      5 2017-07-12  5
      6 2017-08-01  6
      7 2017-08-21  7
      8 2017-09-10  8
      9 2017-09-30  9
      
      df = df.set_index('Date').last('5M')
      print (df)
                  a
      Date         
      2017-05-13  2
      2017-06-02  3
      2017-06-22  4
      2017-07-12  5
      2017-08-01  6
      2017-08-21  7
      2017-09-10  8
      2017-09-30  9
      

      它也适用于重复,只需要排序 DateTime 列:

      rng = pd.date_range('2017-04-03', periods=10, freq='20D')
      df = pd.DataFrame({'Date': rng, 'a': range(10)})  
      df = pd.concat([df,df], ignore_index=True).sort_values('Date')
      print (df)
               Date  a
      0  2017-04-03  0
      10 2017-04-03  0
      1  2017-04-23  1
      11 2017-04-23  1
      2  2017-05-13  2
      12 2017-05-13  2
      3  2017-06-02  3
      13 2017-06-02  3
      4  2017-06-22  4
      14 2017-06-22  4
      5  2017-07-12  5
      15 2017-07-12  5
      6  2017-08-01  6
      16 2017-08-01  6
      17 2017-08-21  7
      7  2017-08-21  7
      18 2017-09-10  8
      8  2017-09-10  8
      9  2017-09-30  9
      19 2017-09-30  9
      
      df = df.set_index('Date').last('5M')
      print (df)
                  a
      Date         
      2017-05-13  2
      2017-05-13  2
      2017-06-02  3
      2017-06-02  3
      2017-06-22  4
      2017-06-22  4
      2017-07-12  5
      2017-07-12  5
      2017-08-01  6
      2017-08-01  6
      2017-08-21  7
      2017-08-21  7
      2017-09-10  8
      2017-09-10  8
      2017-09-30  9
      2017-09-30  9
      
      rng = pd.date_range('2017-04-03', periods=10, freq='20D')
      df = pd.DataFrame({'Date': rng, 'a': range(10)})  
      df = pd.concat([df,df], ignore_index=True)
      print (df)
               Date  a
      0  2017-04-03  0
      1  2017-04-23  1
      2  2017-05-13  2
      3  2017-06-02  3
      4  2017-06-22  4
      5  2017-07-12  5
      6  2017-08-01  6
      7  2017-08-21  7
      8  2017-09-10  8
      9  2017-09-30  9
      10 2017-04-03  0
      11 2017-04-23  1
      12 2017-05-13  2
      13 2017-06-02  3
      14 2017-06-22  4
      15 2017-07-12  5
      16 2017-08-01  6
      17 2017-08-21  7
      18 2017-09-10  8
      19 2017-09-30  9
      
      df = df.set_index('Date').last('5M')
      print (df)
                  a
      Date         
      2017-05-13  2
      2017-06-02  3
      2017-06-22  4
      2017-07-12  5
      2017-08-01  6
      2017-08-21  7
      2017-09-10  8
      2017-09-30  9
      

      【讨论】:

        【解决方案3】:

        它适合我。

        演示:

        In [71]: from pandas_datareader import data as web
        
        In [72]: df = web.DataReader('AAPL', 'yahoo', '2010-04-01')
        
        In [73]: df
        Out[73]:
                          Open        High         Low       Close     Volume   Adj Close
        Date
        2010-04-01  237.410000  238.730003  232.750000  235.969994  150786300   30.572166
        2010-04-05  234.980011  238.509998  234.769993  238.489998  171126900   30.898657
        2010-04-06  238.200005  240.239998  237.000004  239.540009  111754300   31.034696
        2010-04-07  239.549995  241.920010  238.659988  240.600006  157125500   31.172029
        2010-04-08  240.440002  241.540001  238.040001  239.950005  143247300   31.087815
        2010-04-09  241.430012  241.889996  240.460003  241.789993   83545700   31.326203
        2010-04-12  242.199989  243.069996  241.809994  242.290005   83256600   31.390984
        2010-04-13  241.860008  242.800003  241.110004  242.430008   76552700   31.409123
        2010-04-14  245.280006  245.810005  244.069992  245.690002  101019100   31.831486
        2010-04-15  245.779991  249.029999  245.509998  248.920010   94196200   32.249965
        ...                ...         ...         ...         ...        ...         ...
        2017-04-13  141.910004  142.380005  141.050003  141.050003   17652900  141.050003
        2017-04-17  141.479996  141.880005  140.869995  141.830002   16424000  141.830002
        2017-04-18  141.410004  142.039993  141.110001  141.199997   14660800  141.199997
        2017-04-19  141.880005  142.000000  140.449997  140.679993   17271300  140.679993
        2017-04-20  141.220001  142.919998  141.160004  142.440002   23251100  142.440002
        2017-04-21  142.440002  142.679993  141.850006  142.270004   17245200  142.270004
        2017-04-24  143.500000  143.949997  143.179993  143.639999   17099200  143.639999
        2017-04-25  143.910004  144.899994  143.869995  144.529999   18290300  144.529999
        2017-04-26  144.470001  144.600006  143.380005  143.679993   19769400  143.679993
        2017-04-27  143.919998  144.160004  143.309998  143.789993   14106100  143.789993
        
        [1781 rows x 6 columns]
        
        In [74]: df.last('5M')
        Out[74]:
                          Open        High         Low       Close    Volume   Adj Close
        Date
        2016-12-01  110.370003  110.940002  109.029999  109.489998  37086900  109.017344
        2016-12-02  109.169998  110.089996  108.849998  109.900002  26528000  109.425578
        2016-12-05  110.000000  110.029999  108.250000  109.110001  34324500  108.638987
        2016-12-06  109.500000  110.360001  109.190002  109.949997  26195500  109.475358
        2016-12-07  109.260002  111.190002  109.160004  111.029999  29998700  110.550697
        2016-12-08  110.860001  112.430000  110.599998  112.120003  27068300  111.635996
        2016-12-09  112.309998  114.699997  112.309998  113.949997  34402600  113.458090
        2016-12-12  113.290001  115.000000  112.489998  113.300003  26374400  112.810902
        2016-12-13  113.839996  115.919998  113.750000  115.190002  43733800  114.692743
        2016-12-14  115.040001  116.199997  114.980003  115.190002  34031800  114.692743
        ...                ...         ...         ...         ...       ...         ...
        2017-04-13  141.910004  142.380005  141.050003  141.050003  17652900  141.050003
        2017-04-17  141.479996  141.880005  140.869995  141.830002  16424000  141.830002
        2017-04-18  141.410004  142.039993  141.110001  141.199997  14660800  141.199997
        2017-04-19  141.880005  142.000000  140.449997  140.679993  17271300  140.679993
        2017-04-20  141.220001  142.919998  141.160004  142.440002  23251100  142.440002
        2017-04-21  142.440002  142.679993  141.850006  142.270004  17245200  142.270004
        2017-04-24  143.500000  143.949997  143.179993  143.639999  17099200  143.639999
        2017-04-25  143.910004  144.899994  143.869995  144.529999  18290300  144.529999
        2017-04-26  144.470001  144.600006  143.380005  143.679993  19769400  143.679993
        2017-04-27  143.919998  144.160004  143.309998  143.789993  14106100  143.789993
        
        [101 rows x 6 columns]
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2020-05-29
          • 1970-01-01
          • 1970-01-01
          • 2019-06-23
          • 1970-01-01
          • 2019-05-20
          • 2020-04-28
          • 1970-01-01
          相关资源
          最近更新 更多