【问题标题】:Pandas Select last 20 days of data.Pandas 选择最近 20 天的数据。
【发布时间】:2018-03-09 14:33:42
【问题描述】:

我有一个简单的问题,但我似乎找不到严格的答案。

假设我有一个包含日期、开盘价、最高价、最低价、收盘价和成交量的数据框。

我要做的是首先找到我可以使用的当前日期:

today = pd.datetime.today().date()

我的问题在于选择当前日期最近 20 天的数据。

我需要选择最后 20 行,因为我需要在此数据集的关闭列中找到最高和最低值。

任何指针都会有很大帮助。我在谷歌搜索了一段时间,并不断找到不同的答案。

谢谢!

【问题讨论】:

    标签: python pandas data-science


    【解决方案1】:

    设置

    today = pd.datetime.today().date()
    
    df = pd.DataFrame(
        np.random.rand(20, 4),
        pd.date_range(end=today, periods=20, freq='3D'),
        columns=['O', 'H', 'L', 'C'])
    
    df
    
                       O         H         L         C
    2017-08-01  0.821996  0.894122  0.829814  0.429701
    2017-08-04  0.883512  0.668642  0.524440  0.914845
    2017-08-07  0.035753  0.231787  0.421547  0.163865
    2017-08-10  0.742781  0.293591  0.874033  0.054421
    2017-08-13  0.252422  0.632991  0.547044  0.650622
    2017-08-16  0.316752  0.190016  0.504701  0.827450
    2017-08-19  0.777069  0.533121  0.329742  0.603473
    2017-08-22  0.843260  0.546845  0.600270  0.060620
    2017-08-25  0.834180  0.395653  0.189499  0.820043
    2017-08-28  0.806369  0.850968  0.753335  0.902687
    2017-08-31  0.336096  0.145325  0.876519  0.114923
    2017-09-03  0.590195  0.946520  0.009151  0.832992
    2017-09-06  0.901101  0.616852  0.375829  0.332625
    2017-09-09  0.537892  0.852527  0.082807  0.966297
    2017-09-12  0.104929  0.803415  0.345942  0.245934
    2017-09-15  0.085703  0.743497  0.256762  0.530267
    2017-09-18  0.823960  0.397983  0.173706  0.091678
    2017-09-21  0.211412  0.980942  0.833802  0.763510
    2017-09-24  0.312950  0.850760  0.913519  0.846466
    2017-09-27  0.921168  0.568595  0.460656  0.016313
    

    解决方案
    使用 pandas 日期时间索引切片。非常简单明了,pandas 开发人员打算解决这个问题。 注意:关心过去 20 天内有多少行,它只是抓取所有行。这就是我认为OP想要的。

    df[today - pd.offsets.Day(20):]
    
                       O         H         L         C
    2017-09-09  0.537892  0.852527  0.082807  0.966297
    2017-09-12  0.104929  0.803415  0.345942  0.245934
    2017-09-15  0.085703  0.743497  0.256762  0.530267
    2017-09-18  0.823960  0.397983  0.173706  0.091678
    2017-09-21  0.211412  0.980942  0.833802  0.763510
    2017-09-24  0.312950  0.850760  0.913519  0.846466
    2017-09-27  0.921168  0.568595  0.460656  0.016313
    

    【讨论】:

    • 先生,您将在 8 小时后获得我的支持。有很多东西要学。
    【解决方案2】:

    如果您只需要 DataFrame 的最后 20 行,您可以使用 df[-20:]。相反,如果您想获得 20 天前的日期,则必须使用 pd.Timedelta(-19, unit='d') + pd.datetime.today().date()

    In [1]: index = pd.date_range(start=(pd.Timedelta(-30, unit='d')+pd.datetime.today().date()), periods=31)
    
    In [2]: df = pd.DataFrame(np.random.rand(31, 4), index=index, columns=['O', 'H', 'L', 'C'])
    
    In [3]: df = df.reset_index().rename(columns={'index': 'Date'})
    
    In [4]: df
    Out[4]:
             Date         O         H         L         C
    0  2017-08-28  0.616856  0.518961  0.378005  0.716371
    1  2017-08-29  0.300977  0.652217  0.713013  0.842369
    2  2017-08-30  0.875668  0.232998  0.566047  0.969647
    3  2017-08-31  0.273934  0.086575  0.386617  0.390749
    4  2017-09-01  0.667561  0.336419  0.648809  0.619215
    5  2017-09-02  0.988234  0.563675  0.402908  0.671333
    6  2017-09-03  0.111710  0.549302  0.321546  0.201828
    7  2017-09-04  0.469041  0.736152  0.345069  0.336593
    8  2017-09-05  0.674844  0.276839  0.350289  0.862777
    9  2017-09-06  0.128124  0.968918  0.713846  0.415061
    10 2017-09-07  0.920488  0.252980  0.573531  0.270999
    11 2017-09-08  0.113368  0.781649  0.190273  0.758834
    12 2017-09-09  0.414453  0.545572  0.761805  0.586717
    13 2017-09-10  0.348459  0.830177  0.779591  0.783887
    14 2017-09-11  0.571877  0.230465  0.262744  0.360188
    15 2017-09-12  0.844286  0.821388  0.312319  0.473672
    16 2017-09-13  0.605548  0.570590  0.457141  0.882498
    17 2017-09-14  0.242154  0.066617  0.028913  0.969698
    18 2017-09-15  0.725521  0.742362  0.904866  0.890942
    19 2017-09-16  0.460858  0.749581  0.429131  0.723394
    20 2017-09-17  0.767445  0.452113  0.906294  0.978368
    21 2017-09-18  0.342970  0.702579  0.029031  0.743489
    22 2017-09-19  0.221478  0.339948  0.403478  0.349097
    23 2017-09-20  0.147785  0.633542  0.692545  0.194496
    24 2017-09-21  0.656189  0.419257  0.099094  0.708530
    25 2017-09-22  0.329901  0.087101  0.683207  0.558431
    26 2017-09-23  0.902550  0.155262  0.304506  0.756210
    27 2017-09-24  0.072132  0.045242  0.058175  0.755649
    28 2017-09-25  0.149873  0.340870  0.198454  0.725051
    29 2017-09-26  0.972721  0.505842  0.886602  0.231916
    30 2017-09-27  0.511109  0.990975  0.330336  0.898291
    
    In [5]: df[-20:]
    Out[5]:
             Date         O         H         L         C
    11 2017-09-08  0.113368  0.781649  0.190273  0.758834
    12 2017-09-09  0.414453  0.545572  0.761805  0.586717
    13 2017-09-10  0.348459  0.830177  0.779591  0.783887
    14 2017-09-11  0.571877  0.230465  0.262744  0.360188
    15 2017-09-12  0.844286  0.821388  0.312319  0.473672
    16 2017-09-13  0.605548  0.570590  0.457141  0.882498
    17 2017-09-14  0.242154  0.066617  0.028913  0.969698
    18 2017-09-15  0.725521  0.742362  0.904866  0.890942
    19 2017-09-16  0.460858  0.749581  0.429131  0.723394
    20 2017-09-17  0.767445  0.452113  0.906294  0.978368
    21 2017-09-18  0.342970  0.702579  0.029031  0.743489
    22 2017-09-19  0.221478  0.339948  0.403478  0.349097
    23 2017-09-20  0.147785  0.633542  0.692545  0.194496
    24 2017-09-21  0.656189  0.419257  0.099094  0.708530
    25 2017-09-22  0.329901  0.087101  0.683207  0.558431
    26 2017-09-23  0.902550  0.155262  0.304506  0.756210
    27 2017-09-24  0.072132  0.045242  0.058175  0.755649
    28 2017-09-25  0.149873  0.340870  0.198454  0.725051
    29 2017-09-26  0.972721  0.505842  0.886602  0.231916
    30 2017-09-27  0.511109  0.990975  0.330336  0.898291
    
    In [6]: df[df.Date.isin(pd.date_range(pd.Timedelta(-19, unit='d')+pd.datetime.today().date(), periods=20))]
    Out[6]:
             Date         O         H         L         C
    11 2017-09-08  0.113368  0.781649  0.190273  0.758834
    12 2017-09-09  0.414453  0.545572  0.761805  0.586717
    13 2017-09-10  0.348459  0.830177  0.779591  0.783887
    14 2017-09-11  0.571877  0.230465  0.262744  0.360188
    15 2017-09-12  0.844286  0.821388  0.312319  0.473672
    16 2017-09-13  0.605548  0.570590  0.457141  0.882498
    17 2017-09-14  0.242154  0.066617  0.028913  0.969698
    18 2017-09-15  0.725521  0.742362  0.904866  0.890942
    19 2017-09-16  0.460858  0.749581  0.429131  0.723394
    20 2017-09-17  0.767445  0.452113  0.906294  0.978368
    21 2017-09-18  0.342970  0.702579  0.029031  0.743489
    22 2017-09-19  0.221478  0.339948  0.403478  0.349097
    23 2017-09-20  0.147785  0.633542  0.692545  0.194496
    24 2017-09-21  0.656189  0.419257  0.099094  0.708530
    25 2017-09-22  0.329901  0.087101  0.683207  0.558431
    26 2017-09-23  0.902550  0.155262  0.304506  0.756210
    27 2017-09-24  0.072132  0.045242  0.058175  0.755649
    28 2017-09-25  0.149873  0.340870  0.198454  0.725051
    29 2017-09-26  0.972721  0.505842  0.886602  0.231916
    30 2017-09-27  0.511109  0.990975  0.330336  0.898291
    

    【讨论】:

      【解决方案3】:

      一个例子是下面的代码。

      import pandas as pd
      import numpy as np
      
      today = pd.datetime.today().date()
      
      df = pd.DataFrame({"date": pd.date_range('20170901','20170930', freq='D'),
                         "open": np.random.rand(30),
                         "high": np.random.rand(30),
                         "low": np.random.rand(30),
                         "close": np.random.rand(30),
                         "volume": np.random.rand(30)},
                        columns = ["date", "open", "high", "low", "close", "volume"])
      selected_date = pd.date_range(today - pd.to_timedelta(20, unit='d'), today, freq='D')
      df_selected = df[df["date"].isin(selected_date)]
      # Out[40]:
      #          date      open      high       low     close    volume
      # 7  2017-09-08  0.790424  0.999621  0.139619  0.669588  0.476784
      # 8  2017-09-09  0.190239  0.439975  0.362905  0.018472  0.905773
      # 9  2017-09-10  0.184327  0.686411  0.124636  0.741130  0.132774
      # 10 2017-09-11  0.346019  0.022173  0.422704  0.159098  0.011801
      # 11 2017-09-12  0.549928  0.228514  0.851650  0.824209  0.756816
      # 12 2017-09-13  0.413550  0.994019  0.340958  0.905432  0.289316
      # 13 2017-09-14  0.435034  0.485978  0.768520  0.534148  0.276084
      # 14 2017-09-15  0.839840  0.775490  0.481123  0.911378  0.928908
      # 15 2017-09-16  0.442393  0.512893  0.519516  0.844619  0.813230
      # 16 2017-09-17  0.723789  0.646345  0.081776  0.388496  0.391421
      # 17 2017-09-18  0.964289  0.849776  0.156879  0.663885  0.062165
      # 18 2017-09-19  0.001000  0.174666  0.694151  0.777330  0.739554
      # 19 2017-09-20  0.426997  0.541273  0.789910  0.218263  0.748694
      # 20 2017-09-21  0.217904  0.295377  0.087909  0.765242  0.555663
      # 21 2017-09-22  0.910734  0.848182  0.476946  0.374580  0.079900
      # 22 2017-09-23  0.160963  0.795219  0.956262  0.744048  0.645552
      # 23 2017-09-24  0.412634  0.722252  0.226693  0.524794  0.910259
      # 24 2017-09-25  0.535072  0.131761  0.931164  0.618055  0.542512
      # 25 2017-09-26  0.697222  0.552784  0.537899  0.773403  0.916538
      # 26 2017-09-27  0.257628  0.479550  0.539444  0.540076  0.344933
      # 27 2017-09-28  0.270114  0.914036  0.137004  0.939907  0.736016
      

      此外,关闭项目的最大值和最小值如下获得。

      df_max = df_selected[df_selected['close'] == df_selected['close'].max()]
      # Out[48]:
      #          date      open      high       low     close    volume
      # 27 2017-09-28  0.270114  0.914036  0.137004  0.939907  0.736016
      
      df_min = df_selected[df_selected['close'] == df_selected['close'].min()]
      # Out[49]:
      #         date      open      high       low     close    volume
      # 8 2017-09-09  0.190239  0.439975  0.362905  0.018472  0.905773
      

      【讨论】:

        猜你喜欢
        • 2010-09-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-12-15
        • 1970-01-01
        • 2018-08-20
        • 2020-04-01
        • 2012-07-08
        相关资源
        最近更新 更多