熊猫数据框通过多索引删除行答案

【问题标题】：pandas dataframe drop rows by multiindex熊猫数据框通过多索引删除行
【发布时间】：2015-06-13 16:06:08
【问题描述】：

我想使用 MultiIndex 值从 pandas 数据框中删除行。

我已经尝试了很多东西，但我把我认为更接近的东西放在了下面。（实际上我会解释完整的问题，因为可能会有使用完全不同的方法的替代解决方案）。从相关矩阵中，我想获得更多相关的列对。我使用unstack 并将结果放入数据框中：

In [263]: corr_df = pd.DataFrame(total.corr().unstack())

然后得到更高的相关性（实际上我也应该得到负数）。

In [264]: high = corr_df[(corr_df[0] > 0.5) & (corr_df[0] < 1.0)]

In [236]: print high
                                                  0
residual sugar       density               0.552517
free sulfur dioxide  total sulfur dioxide  0.720934
total sulfur dioxide free sulfur dioxide   0.720934
                     wine                  0.700357
density              residual sugar        0.552517
wine                 total sulfur dioxide  0.700357

足够接近，但有重复，这实际上是相关矩阵的点。为了清理它们，我的想法是迭代高值以删除重复项：

In [267]:
for row in high.iterrows():
    print row[0][0], ",", row[0][1]
    print high.loc[row[0][1]].loc[row[0][0]].index
    high.drop(high.loc[row[0][1]].loc[row[0][0]].index)
residual sugar , density
Int64Index([0], dtype='int64')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-267-1258da2a4772> in <module>()
      2     print row[0][0], ",", row[0][1]
      3     print high.loc[row[0][1]].loc[row[0][0]].index
----> 4     high.drop(high.loc[row[0][1]].loc[row[0][0]].index)

...
[huge stack of errors]
...
KeyError: 0

当索引正常时drop 方法工作正常（请参阅drop），但是，当我得到MultiIndex 时如何构建label？

【问题讨论】：

标签： python python-2.7 pandas

【解决方案1】：

corr_df = pd.DataFrame(
{'residual sugar': [1, 0, 0, 0.552517, 0], 
'free sulfur dioxide': [0, 1, 0.720934, 0, 0], 
'total sulfur dioxide': [0, 0.720934, 1, 0, 0.700357],
'density': [0.552517, 0, 0, 1, 0],
'wine': [0, 0, 0.700357, 0, 1]}, 
index=['residual sugar', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'wine']).unstack()

# Notice the slight modification to the original
high = corr_df[(corr_df > 0.5) & (corr_df < 1.0)]

# Sort by index, then values
high.sort_index()
high.sort()

# Drop every other value (e.g. just take the evens)
result = high.iloc[[count for count, _ in enumerate(high) if count % 2 == 0]]
>>> result
density               residual sugar          0.552517
total sulfur dioxide  wine                    0.700357
free sulfur dioxide   total sulfur dioxide    0.720934

【讨论】：