如何根据列子集中存在的值按行创建pandas DataFrame列？答案

【问题标题】：How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?如何根据列子集中存在的值按行创建pandas DataFrame列？
【发布时间】：2017-04-19 05:58:38
【问题描述】：

我有一个熊猫数据框如下：

import pandas as pd
data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"],
         "column2": [338, 519, 871, 1731, 2693, 2963, 3379],
         "column3": [5, 1, 8, 3, 731, 189, 9], 
         "columnA" : [5, 0, 75, 150, 0, 0, 0], 
         "columnB" : [0, 32, 0, 96, 0, 51, 0], 
         "columnC" : [0, 42, 0, 42, 0, 42, 42]}

df = pd.DataFrame(data1)

df
>>>     column1   column2   column3   columnA   columnB   columnC
0         A       338         5         5         0         0
1         B       519         1         0        32        42
2         C       871         8        75         0         0
3         D      1731         3       150        96        42
4         E      2693       731         0         0         0
5         F      2963       189         0        51        42
6         G      3379         9         0         0        42

columnA、columnB 和 columnC 中的值是整数或零。我想检查columnA、columnB 和columnC 中的值，如果columnC 中有一个整数，columnA 和columnB 列中有零。

如果columnC 中有一个值，而columnA 和columnB 中有零，我希望新列newcolumn 中有1。否则，newcolumn 中的值应为 0。

生成的数据框应该是：

>>>     column1   column2   column3   columnA   columnB   columnC     newcolumn
0         A       338         5         5         0         0          0
1         B       519         1         0        32        42          0
2         C       871         8        75         0         0          0
3         D      1731         3       150        96        42          0
4         E      2693       731         0         0         0          0
5         F      2963       189         0        51        42          0
6         G      3379         9         0         0        42          1
....                 .....                         ...........

我知道如何按列检查值（例如，使用 df.columnA == 0）并且创建新列非常简单。但是，如何“按行”检查这个？

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

你可以使用 np.where

df['newcolumn'] = np.where((df.columnA ==0) & (df.columnB == 0) & (df.columnC!= 0), 1, 0)


    column1 column2 column3 columnA columnB columnC newcolumn
0   A       338     5       5       0       0       0
1   B       519     1       0       32      42      0
2   C       871     8       75      0       0       0
3   D       1731    3       150     96      42      0
4   E       2693    731     0       0       0       0
5   F       2963    189     0       51      42      0
6   G       3379    9       0       0       42      1

【讨论】：

【解决方案2】：

您可以在多个条件下使用boolean & 运算符，如下所示

df['new column'] = (df['columnA'] == 0) & (df['columnB'] == 0) & (df['columnC'] != 0)
df['new column'] = df['new column'].astype(int)
df

结果

 column1  column2  column3  columnA  columnB  columnC  new column
0       A      338        5        5        0        0           0
1       B      519        1        0       32       42           0
2       C      871        8       75        0        0           0
3       D     1731        3      150       96       42           0
4       E     2693      731        0        0        0           0
5       F     2963      189        0       51       42           0
6       G     3379        9        0        0       42           1

【讨论】：

我从不知道一系列布尔值在转换为整数时会转换为 0、1 哑元。

【解决方案3】：

你可以使用DataFrame.eval方法：

In [146]: df['newcolumn'] = df.eval("columnA == 0 and columnB == 0 and columnC != 0") \
                              .astype(np.uint8)

In [147]: df
Out[147]:
  column1  column2  column3  columnA  columnB  columnC  newcolumn
0       A      338        5        5        0        0          0
1       B      519        1        0       32       42          0
2       C      871        8       75        0        0          0
3       D     1731        3      150       96       42          0
4       E     2693      731        0        0        0          0
5       F     2963      189        0       51       42          0
6       G     3379        9        0        0       42          1

【讨论】：

【解决方案4】：

# clever regex... might even make good screen name
# might want to use this instead
# v = df.reindex_axis(['columnA', 'columnB', 'columnC'], 1)).values == 0
v = df.filter(regex='[A-Za-z]$').values == 0
v[:, -1] = ~v[:, -1]  # negate the last column
df.assign(New=v.all(1).astype(np.uint8))

  column1  column2  column3  columnA  columnB  columnC  New
0       A      338        5        5        0        0    0
1       B      519        1        0       32       42    0
2       C      871        8       75        0        0    0
3       D     1731        3      150       96       42    0
4       E     2693      731        0        0        0    0
5       F     2963      189        0       51       42    0
6       G     3379        9        0        0       42    1

速度也很快

时间测试

【讨论】：