【问题标题】:Pandas: Slice Dataframe by Datetime (that may not exist) and Return ViewPandas:按日期时间切片数据帧(可能不存在)和返回视图
【发布时间】:2015-04-10 19:50:54
【问题描述】:

我有一个大的数据帧,我想对它进行切片,以便我可以对切片的数据帧执行一些计算,以便在原始数据中更新值。此外,我正在按索引中可能不存在的开始和结束时间对数据帧进行切片。下面是一个简化的示例,但我实际上想根据不同的计算更新一些列。

In [1]: df
Out[1]:

                         A        B         C
TIME
2014-01-02 14:00:00 -1.172285  1.706200    NaN
2014-01-02 14:05:00  0.039511 -0.320798    NaN
2014-01-02 14:10:00 -0.192179 -0.539397    NaN
2014-01-02 14:15:00 -0.475917 -0.280055    NaN
2014-01-02 14:20:00  0.163376  1.124602    NaN
2014-01-02 14:25:00 -2.477812  0.656750    NaN

我已经尝试了以下所有语句来创建 sdf 作为我的时间范围内的视图:

start = datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')

sdf = df[start:end]
sdf = df[start < df.index < end]
sdf = df.ix[start:end]
sdf = df.loc[start:end]
sdf = df.truncate(before=start, after=end, copy=False)

sdf[C] == 100

大多数返回副本,我收到 SettingWithCopyWarning 警告。 loc 函数表示索引与日期时间不兼容。这是我应该能够做到的吗。更新切片后我想要的结果是:

In [1]: df
Out[1]:

                         A        B         C
TIME
2014-01-02 14:00:00 -1.172285  1.706200    NaN
2014-01-02 14:05:00  0.039511 -0.320798    NaN
2014-01-02 14:10:00 -0.192179 -0.539397    100
2014-01-02 14:15:00 -0.475917 -0.280055    100
2014-01-02 14:20:00  0.163376  1.124602    100
2014-01-02 14:25:00 -2.477812  0.656750    NaN

任何人都可以建议一种方法吗?我是不是走错了路?

谢谢

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    一种方法是使用loc 并将您的条件包装在括号中并使用按位运算符&amp;,当您比较值数组而不是单个值时,需要按位运算符,括号是必需的到运算符优先级。然后我们可以使用它来使用loc 执行标签选择,并像这样设置“C”列:

    In [15]:
    
    import datetime as dt
    start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
    end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
    df.loc[(df.index > start) & (df.index < end), 'C'] = 100
    df
    Out[15]:
                                A         B    C
    TIME                                        
    2014-01-02 14:00:00 -1.172285  1.706200  NaN
    2014-01-02 14:05:00  0.039511 -0.320798  NaN
    2014-01-02 14:10:00 -0.192179 -0.539397  100
    2014-01-02 14:15:00 -0.475917 -0.280055  100
    2014-01-02 14:20:00  0.163376  1.124602  100
    2014-01-02 14:25:00 -2.477812  0.656750  NaN
    

    如果我们查看您尝试过的每种方法以及为什么它们不起作用:

    sdf = df[start:end] #  will raise KeyError if start and end are not present in index
    sdf = df[start < df.index < end] #  will raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), this is because you are comparing arrays of values not a single scalar value
    sdf = df.ix[start:end] # raises KeyError same as first example
    sdf = df.loc[start:end] #  raises KeyError same as first example
    sdf = df.truncate(before=start, after=end, copy=False) # generates correct result but operations on this will raise SettingWithCopyWarning as you've found
    

    编辑

    您可以将sdf 设置为掩码并将其与loc 一起使用来设置您的“C”列:

    In [7]:
    
    import datetime as dt
    start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
    end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
    sdf = (df.index > start) & (df.index < end)
    df.loc[sdf,'C'] = 100
    df
    Out[7]:
                                A         B    C
    TIME                                        
    2014-01-02 14:00:00 -1.172285  1.706200  NaN
    2014-01-02 14:05:00  0.039511 -0.320798  NaN
    2014-01-02 14:10:00 -0.192179 -0.539397  100
    2014-01-02 14:15:00 -0.475917 -0.280055  100
    2014-01-02 14:20:00  0.163376  1.124602  100
    2014-01-02 14:25:00 -2.477812  0.656750  NaN
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-07-22
      • 2021-06-04
      • 2013-04-17
      • 2020-08-30
      • 2021-12-15
      • 2018-07-02
      • 1970-01-01
      • 2020-05-24
      相关资源
      最近更新 更多