根据条件获取pandas单元格的行列索引答案

【问题标题】：Get row and column index of pandas cells based on conditions根据条件获取pandas单元格的行列索引
【发布时间】：2021-07-27 11:27:00
【问题描述】：

我有一个只包含数字的 pandas 数据框。

我想获取值 >= 1 的所有单元格的 (row_index, column_index) 列表。

我写了一个嵌套的for循环，但确实很慢。

res= []
for i in range (df.shape[0]):
    for j in range (i+1, df.shape[0]):
        if df.iloc[i,j] >= 1:
            res.append ([i,j,df.iloc[i,j]])

有没有更快的方法来完成这项任务？矩阵是对称的，所以我只考虑数据帧的一半。

数据框：

1 2 0
0 0 1
0 1 0

预期输出：

预期输出的第一行（0 0 1）表示在第0行第0列，单元格值> = 1并且它是1。

【问题讨论】：

请与预期的输出共享数据框

标签： python pandas

【解决方案1】：

我喜欢query：

df = df.rename_axis(index='idx', columns='cols')
df.stack().reset_index(name='value').query('value >= 1')

输出：

   idx  cols  value
0    0     0      1
1    0     1      2
5    1     2      1
7    2     1      1

使用rename_axis 处理一些清理列名，然后使用stack、reset_index 并使用query 过滤数据框。

【讨论】：

【解决方案2】：

我们可以使用stack 然后过滤值ge 1：

output = (
    df.stack()
        .loc[lambda f: f.ge(1)]
        .rename_axis(['index', 'column'])
        .reset_index(name='value')
)

output:

   index  column  value
0      0       0      1
1      0       1      2
2      1       2      1
3      2       1      1

stack 重塑 DataFrame，使列和行索引在行中：

output = df.stack()

0  0    1
   1    2
   2    0
1  0    0
   1    0
   2    1
2  0    0
   1    1
   2    0
dtype: int64

loc 可用于链接过滤器：

output = df.stack().loc[lambda f: f.ge(1)]

这也可以分两步完成：

output = df.stack()
output = output[output.ge(1)]

0  0    1
   1    2
1  2    1
2  1    1
dtype: int64

rename_axis 为 MultiIndex 添加标签：

output = (
    df.stack()
        .loc[lambda f: f.ge(1)]
        .rename_axis(['index', 'column'])
)

index  column
0      0          1
       1          2
1      2          1
2      1          1
dtype: int64

然后reset_index 将 MultiIndex 变成列：

output = (
    df.stack()
        .loc[lambda f: f.ge(1)]
        .rename_axis(['index', 'columns'])
        .reset_index(name='value')
)

或者

output = df.stack()
output = output[output.ge(1)].rename_axis(['index', 'columns']).reset_index()

   index  column  value
0      0       0      1
1      0       1      2
2      1       2      1
3      2       1      1

【讨论】：

【解决方案3】：

通过melt()+sort_values()的另一种方式：

out=(df.reset_index()
       .melt('index',var_name='column')
       .query('value>=1')
       .sort_values('index'))

out的输出：

  index     column  value
0   0       0       1
3   0       1       2
7   1       2       1
5   2       1       1

使用的示例数据框：

df=pd.DataFrame(np.array([[1,2,0],[0,0,1],[0,1,0]]))

【讨论】：