Python，Pandas：使用重复索引值重新索引/切片 DataFrame答案

【问题标题】：Python, Pandas: Reindex/Slice DataFrame with duplicate Index valuesPython，Pandas：使用重复索引值重新索引/切片 DataFrame
【发布时间】：2017-01-16 19:52:52
【问题描述】：

让我们考虑一个 DataFrame，它在 2010 年 1 月的每一天包含 1 行 2 个值：

date_range = pd.date_range(dt(2010,1,1), dt(2010,1,31), freq='1D')
df = pd.DataFrame(data = np.random.rand(len(date_range),2), index = date_range)

还有另一个数据稀疏且索引值重复的时间序列：

observations = pd.DataFrame(data =np.random.rand(7,2), index = (dt(2010,1,12),
dt(2010,1,18), dt(2010,1,20), dt(2010,1,20), dt(2010,1,22), dt(2010,1,22),dt(2010,1,28)))

我将第一个 DataFrame df 拆分为一个包含 5 个 DataFrame 的列表，每个 DataFrame 都包含来自原始数据的 1 周数据：df_weeks = [g for n, g in df.groupby(pd.TimeGrouper('W'))]

现在我想将第二个 DataFrame 的数据按相同的 5 周进行拆分。也就是说，这意味着在这种特定情况下，最终变量 obs_weeks 包含 5 个数据帧，跨越与 df_weeks 相同的时间范围，其中 2 个是空的。

我尝试使用reindex，例如在这个问题中：Python, Pandas: Use the GroupBy.groups description to apply it to another grouping

和时期：

p1 =[x.to_period() for x in list(df.groupby(pd.TimeGrouper('W')).groups.keys())]
p1 = sorted(p1)
dfs=[]
for p in p1:
    dff = observations.truncate(p.start_time, p.end_time)
    dfs.append(dff)

（见这个问题：Python, Pandas: Boolean Indexing Comparing DateTimeIndex to Period）

问题在于，如果observations 的索引中的某些值是重复的（就是这种情况），那么这些方法都不会起作用。我还尝试将observations 的索引更改为普通列并对该列进行切片，但我也收到了错误消息。

【问题讨论】：

标签： python pandas

【解决方案1】：

你可以通过一个简单的过滤器来实现这一点：

p1 = [x.to_period() for x in list(df.groupby(pd.TimeGrouper('W')).groups.keys())]
p1 = sorted(p1)
dfs = []
for p in p1:
    dff = observations.ix[
        (observations.index >= p.start_time) &
        (observations.index < p.end_time)]
    dfs.append(dff)

【讨论】：