【问题标题】:Mapping of multiple columns categorical values in pandaspandas中多列分类值的映射
【发布时间】:2020-02-20 12:55:55
【问题描述】:

假设我有一个包含三列分类数据的数据框,我想将三个分类列转换为单个值并映射到原始数据框。我知道这可以通过带有this 的单列来实现,但是多列呢?

例子:从这个

>>>df = pd.DataFrame({'id':['0', '1', '2', '3','4'],
...                   'x':['tall', 'short', 'tall', 'short', 'tall'],
...                   'y':['fat', 'thin', 'thin', 'fat', 'fat'],
...                   'z':['male', 'female', 'female', 'male', 'male']},
...                   dtype='category')

>>>df
  id      x     y       z
0  0   tall   fat    male
1  1  short  thin  female
2  2   tall  thin  female
3  3  short   fat    male
4  4   tall   fat    male

通过与列进行映射:x、y 和 z

>>>df
  id      x     y       z  map
0  0   tall   fat    male    0
1  1  short  thin  female    1
2  2   tall  thin  female    2
3  3  short   fat    male    3
4  4   tall   fat    male    0

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    这是groupby().ngroup()

    df['map'] = df.groupby(['x','y','z'], sort=False).ngroup()
    

    或者,如果您的数据是字符串类型,您可以连接列,可能使用一些特殊字符,并使用单列方法:

    # add('&') may not be needed
    df['map'] = pd.factorize(df[['x','y','z']].add('&').sum(1))[0]
    

    输出:

       id      x     y       z  map
    0   0   tall   fat    male    0
    1   1  short  thin  female    1
    2   2   tall  thin  female    2
    3   3  short   fat    male    3
    4   4   tall   fat    male    0
    

    【讨论】:

      猜你喜欢
      • 2019-07-18
      • 2017-03-12
      • 1970-01-01
      • 2015-08-11
      • 2018-02-20
      • 2014-12-30
      • 2021-11-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多