【问题标题】:alphalens.utils returns KeyError: "None of [DatetimeIndex (['<dates>, dtype='datetime64[ns]', freq=None)] are in the [index]"alphalens.utils 返回 KeyError:“[DatetimeIndex (['<dates>, dtype='datetime64[ns]', freq=None)] 中没有 [DatetimeIndex]”
【发布时间】:2020-10-03 11:09:24
【问题描述】:

我正在尝试对 10K 财务报表进行情绪分析,但使用以下操作时遇到错误:

data = al.utils.get_clean_factor_and_forward_returns(cs_df.stack(), pricing.loc[cs_df.index], quantiles=5, bins=None, periods=[1])

当我print(data)时返回以下KeyError:

KeyError: "None of [DatetimeIndex(['2012-01-01', '2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01'], dtype='datetime64[ns]', freq=None)] are in the [index]"

相关代码块:

import alphalens as al

factor_data = {}
skipped_sentiments = []

for sentiment in sentiments:
    cs_df = cosine_similarities_df[(cosine_similarities_df['sentiment'] == sentiment)]
    cs_df = cs_df.pivot(index='date', columns='ticker', values='value')

    try:
        data = al.utils.get_clean_factor_and_forward_returns(cs_df.stack(), pricing.loc[cs_df.index], quantiles=5, bins=None, periods=[1])
        factor_data[sentiment] = data

    except:
        skipped_sentiments.append(sentiment)

当运行时,我的所有情绪都被打印为跳过,因为它正在跳到 for 循环的 except: 部分。

我的DataFramecs_df在pivot操作后如下所示:

ticker           AMZN
date                 
2012-01-01 0.99706919
2013-01-01 0.99931987
2014-01-01 0.99841510
2015-01-01 0.99805389
2016-01-01 0.99884780
2017-01-01 0.96819148
2018-01-01 0.99957861
2019-01-01 0.99767198

在 for 循环中,我尝试使用日期索引创建一个新列并将其映射到 al.utils,其中 pricing.loc[cs_df.index] 更改为 pricing.loc[cs_df.date] 行,但使用 cs_df['date'] = cs_df.index 无济于事 我还尝试使用以下方法删除列和索引名称:

cs_df.columns.name = None
cs_df.index.name = None

重命名列和索引会产生以下数据框,但仍会返回相同的错误

                 AMZN
2012-01-01 0.99706919
2013-01-01 0.99931987
2014-01-01 0.99841510
2015-01-01 0.99805389
2016-01-01 0.99884780
2017-01-01 0.96819148
2018-01-01 0.99957861
2019-01-01 0.99767198

cs_df = cs_df.reset_index(drop=True) 重置索引也无济于事,只是摆脱了我的日期。 相当肯定问题存在于我使用al.utils 操作的方式上,我认为枢轴不会导致问题。有任何想法吗?感谢您的帮助和时间!

FULL TRACEBACK
---
Traceback (most recent call last):
  File "~FinancialReportSentimentAnalysis.py", line 514, in <module>
    print(al.utils.get_clean_factor_and_forward_returns(cs_df.stack(), pricing.loc[cs_df.index], quantiles=5, bins=None, periods=[1]))
  File "~opt/anaconda3/envs/workspace/lib/python3.8/site-packages/pandas/core/indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "~opt/anaconda3/envs/workspace/lib/python3.8/site-packages/pandas/core/indexing.py", line 1099, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "~opt/anaconda3/envs/workspace/lib/python3.8/site-packages/pandas/core/indexing.py", line 1037, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "~opt/anaconda3/envs/workspace/lib/python3.8/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "~opt/anaconda3/envs/workspace/lib/python3.8/site-packages/pandas/core/indexing.py", line 1298, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2012-01-01', '2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01'], dtype='datetime64[ns]', name='date', freq=None)] are in the [index]"

Process finished with exit code 1

【问题讨论】:

    标签: python pandas dataframe pivot sentiment-analysis


    【解决方案1】:

    对于任何偶然发现这一点的人。

    问题在于财务文件中的日期与定价数据表中的日期相匹配。当我试图匹配一年中的第一天(元旦)和市场关闭并且这一天没有定价数据时,不可能匹配。因此,我将数据更改为每月数据,以获取 1 月 1 日 YYYY 的汇总价格数据。

    pricing.loc[cs_df.index] 是问题的根源。

    【讨论】:

      猜你喜欢
      • 2019-03-17
      • 2020-04-28
      • 2016-11-25
      • 2022-07-21
      • 2018-06-19
      • 2022-08-15
      • 2019-05-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多