【问题标题】:Replace column values using a dictionary使用字典替换列值
【发布时间】:2018-02-08 23:20:32
【问题描述】:

我有这个数据框,其中性别应该是男性或女性。

from io import StringIO
import pandas as pd

audit_trail = StringIO('''
course_id AcademicYear_to months TotalFee Gender
260 2017 24 100 male
260 2018 12 140 male
274 2016 36 300 mail
274 2017 24 340 female
274 2018 12 200 animal
285 2017 24 300 bird
285 2018 12 200 maela
''')

df11 = pd.read_csv(audit_trail, sep=" "  )

我可以用字典纠正拼写错误。

corrections={'mail':'male', 'mael':'male', 'maae':'male'}
df11.Gender.replace(corrections)

但我正在寻找一种方法,只保留男性/女性和“其他”类别的其余选项。预期输出:

0      male
1      male
2      male
3    female
4    other
5    other
6      male
Name: Gender, dtype: object

【问题讨论】:

    标签: python pandas dictionary dataframe replace


    【解决方案1】:

    在您的 corrections 字典中添加另外两个虚拟条目:

    corrections = {'male'   : 'male',    # dummy entry for male
                   'female' : 'female',  # dummy entry for female
                   'mail'   : 'male', 
                   'maela'  : 'male', 
                   'maae'   : 'male'}
    

    现在,使用mapfillna

    df11.Gender = df11.Gender.map(corrections).fillna('other')
    df11
    
       course_id  AcademicYear_to  months  TotalFee  Gender
    0        260             2017      24       100    male
    1        260             2018      12       140    male
    2        274             2016      36       300    male
    3        274             2017      24       340  female
    4        274             2018      12       200   other
    5        285             2017      24       300   other
    6        285             2018      12       200    male
    

    【讨论】:

      【解决方案2】:

      你可以使用:

      corrections={'mail':'male', 'maela':'male', 'maae':'male', 'male':'male', 'female':'female'}
      df11[['Gender']] = df11[['Gender']].applymap(corrections.get).fillna('other')
      print (df11)
         course_id  AcademicYear_to  months  TotalFee  Gender
      0        260             2017      24       100    male
      1        260             2018      12       140    male
      2        274             2016      36       300    male
      3        274             2017      24       340  female
      4        274             2018      12       200   other
      5        285             2017      24       300   other
      6        285             2018      12       200    male
      

      编辑:

      对于仅替换一列是更好的 cᴏʟᴅsᴘᴇᴇᴅ 答案。如果要替换多个列,最好是applymap

      【讨论】:

      • 是的,有时我把它复杂化了。 ;)