如何解决“重新索引仅对唯一值索引对象有效”错误答案

【问题标题】：how to solve 'Reindexing only valid with uniquely valued Index objects' error如何解决“重新索引仅对唯一值索引对象有效”错误
【发布时间】：2020-07-26 15:59:05
【问题描述】：

我有一个看起来像这样的数据框：

           date        holiday  item_cnt_day    shop_id      cnt_sem    cnt_mes     cnt_year
0        2013-01-01       1         0.0           59         0.000000   0.000000    0.000000
1        2013-01-02       1         0.0           59         0.000000   0.000000    0.000000
2        2013-01-03       1         0.0           59         0.000000   0.000000    0.000000
3        2013-01-04       1         0.0           59         0.000000   0.000000    0.000000
4        2013-01-05       0         0.0           59         0.000000   0.000000    0.000000
          ......         ...        ...           ...           ...        ...         ...
1029    2015-10-27        0         4.0           36         1.142857   0.321429    0.024658
1030    2015-10-28        0         1.0           36         1.285714   0.357143    0.027397
1031    2015-10-29        0         1.0           36         1.142857   0.392857    0.030137
1032    2015-10-30        0         4.0           36         1.714286   0.535714    0.041096
1033    2015-10-31        0         1.0           36         1.857143   0.571429    0.043836

日期从 2013 年 1 月 1 日到 2015 年 10 月 31 日，这个日期范围适用于每个 shop_id，对于每个 shop_id，我都有这个日期范围，因此，我有重复的日期，但是什么我正在尝试做的是只拥有每个 shop_id 前 365 天之后的日期，我正在尝试通过使用此功能来做到这一点：

def no_todos(df, shops):
    # shops is a list of shops and there are 60 shops in this list
    # df is the dataframe to be operated in the loop

    new_df = pd.DataFrame(df)

    # Here I'm trying to only keep those observations which come after the first 365 days for each shop
    for t in shops:
        new_df['shop_id'][t] = df[365::]
    return new_df

但是，我遇到了这个错误：重新索引只对具有唯一值的索引对象有效。有谁知道如何解决这个问题？提前致谢。

【问题讨论】：

我没有正确理解这个问题，您希望仅保留每家商店的前 365 天后的天数，并将它们存储在您的新数据框中，对吧？
是的，你提供的答案正是我想要的。

标签： python pandas reindex

【解决方案1】：

首先对数据框进行排序，然后进行分组，然后进行负尾。 groupby tail 方法中没有实现负尾，因此您需要创建自己的函数。这将跳过每组的第一行

df.sort_values(['shop_id', 'date'], ascending=[True, True])

def negative_tail(group, n):
    return group._selected_obj[group.cumcount(ascending=True) >= n]

final_result = negative_tail(df.groupby('shop_id'), 365).copy()

【讨论】：