pandas 获得多索引变化的整数索引答案

【问题标题】：pandas get integer indices where multiindex changespandas 获得多索引变化的整数索引
【发布时间】：2016-07-12 18:38:24
【问题描述】：

我有一个带有多索引的非常大的数据框。我需要将一列传递给 C 以快速执行操作。对于此操作，我需要知道多索引更改值的位置。由于这是一个大型数据框，我不想遍历 python 中的行或索引。一个小例子：

import numpy as np
import pandas as pd
a = np.array([['bar', 'one', 0, 0],
       ['bar', 'two', 1, 2],
       ['bar', 'one', 2, 4],
       ['bar', 'two', 3, 6],
       ['foo', 'one', 4, 8],
       ['foo', 'two', 5, 10],
       ['bar', 'one', 6, 12],
       ['bar', 'two', 7, 14]], dtype=object)
df = pd.DataFrame(a, columns=['ix0', 'ix1', 'cd0', 'cd1'])
df.sort_values(['ix0', 'ix1'], inplace=True)
df.set_index(['ix0', 'ix1'], inplace=True)

数据框如下所示：

In [7]: df
Out[7]: 
        cd0 cd1
ix0 ix1        
bar one   0   0
    one   2   4
    one   6  12
    two   1   2
    two   3   6
    two   7  14
foo one   4   8
    two   5  10

现在我想要一个数组或列表来显示多索引中值的变化位置。即整数索引，其中 (bar, one) 更改为 (bar, two)，(bar, two) 更改为 (foo, one) 等。

为了能够构建分层输出，似乎该数据必须存在于索引中。有什么办法吗？

我正在寻找的示例输出是：[0, 3, 6, 7]。

谢谢

【问题讨论】：

标签： python pandas

【解决方案1】：

您可以将np.unique 与return_index=True 一起使用：

In [69]: uniques, indices = np.unique(df.index, return_index=True)

In [70]: indices
Out[70]: array([0, 3, 6, 7])

【讨论】：

不错。这确实有效。不过有点慢。根据我的实际数据，它比创建索引本身慢一个数量级。 %time df.set_index(key_cols, inplace=True) CPU 时间：用户 886 毫秒，系统：209 毫秒，总计：1.09 秒挂壁时间：1.11 秒 %time uniques, indices = np.unique(df.index, return_index=True ) CPU 时间：用户 13.3 s，系统：0 ns，总计：13.3 s Wall time：13.3 s