将列从一个数据框映射到另一个数据框以创建新列[重复]答案

【问题标题】：Mapping columns from one dataframe to another to create a new column [duplicate]将列从一个数据框映射到另一个数据框以创建新列[重复]
【发布时间】：2018-02-13 10:27:44
【问题描述】：

我有一个数据框

id  store    address
1    100        xyz
2    200        qwe
3    300        asd
4    400        zxc
5    500        bnm

我有另一个数据框 df2

serialNo    store_code  warehouse
    1          300         Land
    2          500         Sea
    3          100         Land
    4          200         Sea
    5          400         Land

我希望我的最终数据框看起来像：

id  store    address  warehouse
1    100        xyz     Land
2    200        qwe     Sea
3    300        asd     Land
4    400        zxc     Land
5    500        bnm     Sea

即从一个数据帧映射到另一个创建新列

【问题讨论】：

标签： python pandas dataframe mapping

【解决方案1】：

`df.merge`

out = (df1.merge(df2, left_on='store', right_on='store_code')
          .reindex(columns=['id', 'store', 'address', 'warehouse']))
print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

`pd.concat` + `df.sort_values`

u = df1.sort_values('store')
v = df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)
out = pd.concat([u, v], 1)

print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

假设您的数据框已经在store 上排序，第一个排序调用是多余的，在这种情况下您可以将其删除。

`df.replace`/`df.map`

s = df1.store.replace(df2.set_index('store_code')['warehouse'])
print(s) 
0    Land
1     Sea
2    Land
3    Land
4     Sea

df1['warehouse'] = s
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

或者，显式创建映射。如果您想稍后使用它，这很有效。

mapping = dict(df2[['store_code', 'warehouse']].values)
df1['warehouse'] = df1.store.map(mapping)
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

【讨论】：

如何使用大量数据进行映射，例如数据帧 5 到 1000 万？我想知道那个 dict 是否会有效地工作。
@DISC-O 这取决于数据，但 pandas 在这种规模的数据上通常不能很好地工作。更多地考虑分布式处理，例如 dask。
哪一个是最快的？
@Pablo 这取决于你的数据，最好用%timeit statements 测试它

【解决方案2】：

使用map 或join：

df1['warehouse'] = df1['store'].map(df2.set_index('store_code')['warehouse'])
print (df1)
   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

df1 = df1.join(df2.set_index('store_code'), on=['store']).drop('serialNo', 1)
print (df1)
   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

【讨论】：

在类似数据集中运行 .map 代码时出现此错误。 Reindexing only valid with uniquely valued Index objects
我认为您在 store_code 和 df2 中有重复项存在问题。所以需要df1['store'].map(df2.drop_duplicates('store_code').set_index('store_code')['warehouse'])
正确！谢谢:)

df.merge

pd.concat + df.sort_values

df.replace/df.map

`df.merge`

`pd.concat` + `df.sort_values`

`df.replace`/`df.map`