Python Pandas Dataframe：索引长度不匹配 - df['column'] = ndarray答案

【问题标题】：Python Pandas Dataframe: length of index does not match - df['column'] = ndarrayPython Pandas Dataframe：索引长度不匹配 - df['column'] = ndarray
【发布时间】：2018-09-19 05:17:21
【问题描述】：

我有一个包含用于分析的 EOD 财务数据 (OHLC) 的 pandas 数据框。

我正在使用https://github.com/cirla/tulipy 库来生成技术指标值，这些值具有特定的时间段作为选项。例如。 timeperiod=5 的 ADX 显示过去 5 天的 ADX。

由于这个时间段，带有指标值的生成数组的长度总是比 Dataframe 短。因为前 5 天的价格用于生成第 6 天的 ADX..

    pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=14)

    df['mdi_14'] = mdi14
    df['pdi_14'] = pdi14
    >> ValueError: Length of values does not match length of index

不幸的是，与 TA-LIB 不同，这个郁金香库不提供前几天空闲的 NaN 值...

有没有一种简单的方法可以将这些 NaN 添加到 ndarray 中？或者在某个索引处插入 df 并让它自动为之前的行创建 NaN？

提前感谢，我已经研究了好几天了！

【问题讨论】：

标签： python pandas dataframe time-series valueerror

【解决方案1】：

tulip library 的 C 版本包括每个指标的 start 函数（参考：https://tulipindicators.org/usage），可用于在给定一组输入选项的情况下确定指标的输出长度。不幸的是，python 绑定库tulipy 似乎没有包含此功能。相反，您必须诉诸于动态重新分配索引值以将输出与原始 DataFrame 对齐。

这是一个使用 tulipy 文档中价格系列的示例：

#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
 84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])

#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))

#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()

#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)

close   BBANDS_5_2_low  BBANDS_5_2_mid  BBANDS_5_2_up
0   81.06   NaN NaN NaN
1   81.59   NaN NaN NaN
2   82.87   NaN NaN NaN
3   83.00   NaN NaN NaN
4   83.61   80.530042   82.426  84.321958
5   83.15   81.494061   82.844  84.193939
6   82.84   82.533343   83.094  83.654657
7   83.99   82.471983   83.318  84.164017
8   84.55   82.417750   83.628  84.838250
9   84.36   82.435203   83.778  85.120797
10  85.53   82.511331   84.254  85.996669
11  86.54   83.142618   84.994  86.845382
12  86.89   83.536488   85.574  87.611512
13  87.77   83.870324   86.218  88.565676
14  87.29   85.288871   86.804  88.319129

【讨论】：

【解决方案2】：

也许自己在代码中进行转换？

period = 14
pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=period
)

df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14

我希望他们将来会在 lib 中使用 NAN 填充第一个值。这样的时间序列数据不带任何标签是很危险的。

【讨论】：

谢谢！！这行得通。我希望 Full MCVE 也能工作，但这更容易集成。

【解决方案3】：

完整的 MCVE

df = pd.DataFrame(1, range(10), list('ABC'))

a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)

c = np.row_stack([b, a])

d = pd.DataFrame(c, df.index, df.columns)
d

     A    B    C
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN
4  NaN  NaN  NaN
5  NaN  NaN  NaN
6  2.0  2.0  2.0
7  2.0  2.0  2.0
8  2.0  2.0  2.0
9  2.0  2.0  2.0

【讨论】：