【问题标题】:How to Create a Single Column off multiple columns in Pandas using .isin() and a list?如何使用 .isin() 和列表在 Pandas 中的多列中创建单列?
【发布时间】:2021-11-14 02:58:32
【问题描述】:

我已将一个更复杂的问题分解为更简单的问题。实际问题有更大的列表和更多的列。

从这个df开始:

 i |     COL1   |      COL2   |    COL3     |    COL4  |  Revenue    |  QTY    | Products
 
0  |      Coin  |   Gold Krug | Gold Coin   |  Coins   | 2333677473  |   21    |      12

1  | Gold Coin  |     Coins   | Gold Coin   |  Coins   | 2564774784  |   28    |    14

2  | Gold Coin  |     Coins   | Gold Krug   | Coins    |3256666647   |   35    |     16

3  |Gold Coin   |    Coins    |  Coins      |Gold Krug |    3456788  |   42    |     18

4  |Gold Krug   | Gold Coin   |  Coins      | Coins    |  4588960    | 49      |   20

5  |Gold Coin   |    Coins    | Gold Krug   | Coins    |346869909    |56       | 22

6  |Gold Coin   |    Coins    | Gold Coin   |  Coins   | 3777989     |63       | 24

7  |Gold Coin   |Silver Krug  |Gold Coin    | Coins    | 37687589    |70       | 26

8  |Gold Coin   |    Coins    |Gold Coin    | Coins    | 45789889    |77       | 28

9  |Gold Coin   | Gold Krug   |Gold Coin    |Coins     |    468      |84       | 30

我希望输出为 DF,新列如下:

i |  Category    |    Revenue         | QTY   |Products 

0 |Gold Krug     |  2333677473        |21     |    12

2 |Gold Krug     |  3256666647        | 35    |     16

3 |Gold Krug     |     3456788        | 42    |     18

4 | Gold Krug    |      4588960       |  49   |      20

5 | Gold Krug    |    346869909       |  56   |      22

7 | Silver Krug  |     37687589       |  70   |      26

9 | Gold Krug    |          468       |  84   |      30

我用过这个,但根本不明白如何使用列表中与新列匹配的值来创建新列:

KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']

df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]

print(df)

output :
i   |COL1         |COL2          |COL3          |COL4       |Revenue    |QTY    |Products
 
0   |Coin         |Gold Krug     |Gold Coin     |Coins      |2333677473 |21     |12

2   |Gold Coin    |Coins         |Gold Krug     |Coins      |3256666647 |35     |16

3   |Gold Coin    |Coins         |Coins         |Gold Krug  |3456788    |42     |18

4   |Gold Krug    |Gold Coin     |Coins         |Coins      |4588960    |49     |20

5   |Gold Coin    |Coins         |Gold Krug     |Coins      |346869909  |56     |22

7   |Gold Coin    |Silver Krug   |Gold Coin     |Coins      |37687589   |70     |26

9   |Gold Coin    |Gold Krug     |Gold Coin     |Coins      |468        |84     |30

【问题讨论】:

    标签: python pandas dataframe multiple-columns isin


    【解决方案1】:

    将搜索分成两部分,然后连接:

    category = (df.filter(like='COL')
                  .agg(','.join, axis = 1)
                  .str.extract(fr"({'|'.join(KRUG)})")
                  .dropna()
                  .set_axis(['category'], axis = 'columns')
                )
    
    others = df.loc[df.filter(like='COL').isin(KRUG).any(1), 
                    ['Revenue', 'QTY', 'Products']]
    
    pd.concat([category, others], axis = 'columns')
    
          category     Revenue  QTY  Products
    0    Gold Krug  2333677473   21        12
    2    Gold Krug  3256666647   35        16
    3    Gold Krug     3456788   42        18
    4    Gold Krug     4588960   49        20
    5    Gold Krug   346869909   56        22
    7  Silver Krug    37687589   70        26
    9    Gold Krug         468   84        30
    
    

    【讨论】:

    • 不用担心。请使用这些有关空值的详细信息创建一个新问题
    • 我能够修复 NaN,但我无法让其他部分正常运行。所以我使用了: others = df[df[['COL1', 'COL2', 'COL3', 'COL4']].isin(KRUG).any(axis=1)] others.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True) print(others) new = pd.concat([category, others], axis = 'columns')
    【解决方案2】:

    这是一个使用 apply() 的方法,虽然应该有一个更简单的方法使用 .str。如果数据库不是太大,应该没问题。

    import numpy as np
    def get_coin(x):
        for k in KRUG:
            if k in x.tolist():
                return k
        return np.nan
    
    df['category'] = df[['COL1', 'COL2', 'COL3', 'COL4']].apply(get_coin, axis=1)
    df.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True)
    df.dropna(inplace=True)
    
       i     Revenue  QTY  Products     category
    0  0  2333677473   21        12    Gold Krug
    2  2  3256666647   35        16    Gold Krug
    3  3     3456788   42        18    Gold Krug
    4  4     4588960   49        20    Gold Krug
    5  5   346869909   56        22    Gold Krug
    7  7    37687589   70        26  Silver Krug
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-31
      • 1970-01-01
      • 2017-10-22
      • 2019-08-31
      • 2022-11-27
      • 1970-01-01
      相关资源
      最近更新 更多