【问题标题】:Replace values in a column based on another dataframe根据另一个数据框替换列中的值
【发布时间】:2021-03-21 18:20:22
【问题描述】:

我有一张桌子:

Name        Profession         Character
Ben         cinematographer    Nan
Scarlett    actress            Black Widow
Robert      actor              Iron Man
Chris       actor              Thor
Kevin       producer           Nan

我创建了一个新数据框,其中包含一列从上表升序排列的唯一值和一个增量列

ID    Job
1      actor
2      actress
3      cinematographer
4      producer

现在我需要用新表中对应的 ID 替换原表中的职业列中的值 期望的输出

Name        Profession         Character
Ben         3                  Nan
Scarlett    2                  Black Widow
Robert      1                  Iron Man
Chris       1                  Thor
Kevin       4                  Nan

code so far
df=pdf.read_csv(filename)
column = df['Profession'].unique()
new_df=pd.DataFrame(column, columns=['Job])
new_df=new_df.sort_values(['Job'])
new_df = new_df.reset_index()
new_df.columns.values[0] = 'ID'
new_df['ID'] = new_df.index + 1
df.loc[df['Profession] == new_df['Job'], 'Profession'] = new_df['ID']

The last line yeilds 'ValueError: Can only compare identically-labeled Series objects'

【问题讨论】:

  • 听起来您真的只想将该列转换为分类?在哪个演员表中使用.astype('category')

标签: python-3.x pandas dataframe numpy


【解决方案1】:

然后尝试replace

df1.Profession = df1.Profession.replace(df2.set_index('Job').ID)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-09-01
    • 2019-01-31
    • 1970-01-01
    • 2022-07-30
    • 2019-03-23
    • 2021-06-11
    • 1970-01-01
    相关资源
    最近更新 更多