【问题标题】：logic element-wise operations in pandas time-series dataframe熊猫时间序列数据框中的逻辑元素操作
【发布时间】：2017-09-10 09:33:27
【问题描述】：

我有一个带有时间戳日期时间索引和对应于每个日期的值的 pandas DataFrame。例如，df = pd.DataFrame(['0.11', '0.07', '0.04', '-0.11', '-0.04', '0.08', '0.1'], index=['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07'], columns=['values'])。

我想根据上述数据框的当前值和历史值创建一个额外的列（我们称之为'new_value'）。

逻辑应该是：

如果值大于或等于 0.1，则应设置“new_value” 到 -1，
一旦 'new_value' 设置为 -1，它应该保持 -1 直到一个值小于或等于 0.05 被注册，
如果值小于或等于-0.1，则应设置'new_value' +1，
一旦 'new_value' 设置为 +1，它应该保持 +1 直到一个值大于或等于 -0.05 被注册，
否则 'new_value' 等于 0

我尝试了多种解决方案，但似乎无法解决这个问题。例如，

new_frame = pd.DataFrame(np.zeros(len(df.index),index=df.index,columns=['new_value'])
for date in df.index:
    if df['value'][date.strftime('%Y-%m-%d')] > 0.1:
        new_frame.set_value(date.strftime("%Y-%m-%d"),'new_value',-1)

但我收到错误：'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'

如果我将第三行更改为：

if df['value'][date.strftime('%Y-%m-%d').item() > 0.1:

我收到错误：'ValueError: can only convert an array of size 1 to a Python scalar'

【问题讨论】：

标签： pandas numpy time-series logical-operators

【解决方案1】：

`numpy.searchsorted`

s = df['values'].astype(float)
al = np.array([-.1, -.05])
bl = np.array([1, np.nan, 0])
ar = np.array([.05, .1])
br = np.array([0, np.nan, -1])
l = bl[al.searchsorted(s.values)]
r = br[ar.searchsorted(s.values, side='right')]

df.assign(new_values=pd.Series(l + r, s.index).ffill())

           values  new_values
2017-01-01   0.11        -1.0
2017-01-02   0.07        -1.0
2017-01-03   0.04         0.0
2017-01-04  -0.11         1.0
2017-01-05  -0.04         0.0
2017-01-06   0.08         0.0
2017-01-07    0.1        -1.0

工作原理

只需要获取一系列浮点数，将其命名为s
设置左侧断点al
设置左侧映射值bl
设置右侧断点ar
设置右侧映射值br
searchsorted 会找到值应该放在前面的索引
使用来自searchsorted 的索引来识别映射值
在寻找右侧值时，我使用side='right'
再次映射值。
添加左右结果。 nan + 值为 nan
ffill 向前传播值。

设置
假设数据框 df 由 OP 给出

df = pd.DataFrame(
    ['0.11', '0.07', '0.04', '-0.11',
     '-0.04', '0.08', '0.1'],
    ['2017-01-01', '2017-01-02',
     '2017-01-03', '2017-01-04',
     '2017-01-05', '2017-01-06',
     '2017-01-07'],
    ['values']
)

           values
2017-01-01   0.11
2017-01-02   0.07
2017-01-03   0.04
2017-01-04  -0.11
2017-01-05  -0.04
2017-01-06   0.08
2017-01-07    0.1

【讨论】：

这是一个很好的答案，谢谢。在建立我的理解方面，有人能解释为什么我原来的方法行不通吗？
@Panda_User 见this answer 如果有帮助，请随意投票 :-)
@Panda_User 我看到你在初始阶段陷入困境......是的，我给你的是一个矢量化解决方案。您正在使用循环。现在你不必了。许多人在执行循环时陷入了将系列评估为布尔值的陷阱。