【发布时间】:2021-03-11 16:29:55
【问题描述】:
当我合并两个数据框时,它会保留左侧和右侧数据框的列 附加了 _x 和 _y 。 但我希望它成为一列并“合并”两列的值,以便:
- 当值相同时,它只放置一个值
- 当值不同时,它会根据名为“日期”的另一列保留该值 并根据日期取“最新”的值。
我也尝试过使用连接,在这种情况下它确实“合并”了两列,但它似乎只是“附加”了两行。
例如,在下面的代码中,我想将数据帧 df_desired 作为输出。我怎样才能得到它?
import pandas as pd
import numpy as np
np.random.seed(30)
company1 = ('comA','comB','comC','comD')
df1 = pd.DataFrame(columns=None)
df1['company'] = company1
df1['clv']=[100,200,300,400]
df1['date'] = [20191231,20191231,20191001,20190931]
print("\ndf1:")
print(df1)
company2 = ('comC','comD','comE','comF')
df2 = pd.DataFrame(columns=None)
df2['company'] = company2
df2['clv']=[300,450,500,600]
df2['date'] = [20191231,20191231,20191231,20191231]
print("\ndf2:")
print(df2)
df_desired = pd.DataFrame(columns=None)
df_desired['company'] = ('comA','comB','comC','comD','comE','comF')
df_desired['clv']=[100,200,300,450,500,600]
df_desired['date'] = [20191231,20191231,20191231,20191231,20191231,20191231]
print("\ndf_desired:")
print(df_desired)
df_merge = pd.merge(df1,df2,left_on = 'company',
right_on = 'company',how='outer')
print("\ndf_merge:")
print(df_merge)
# alternately
df_concat = pd.concat([df1, df2], ignore_index=True, sort=False)
print("\ndf_concat:")
print(df_concat)
【问题讨论】:
标签: python pandas dataframe merge