【发布时间】:2020-04-28 21:55:58
【问题描述】:
我有两个数据框,我想遍历“公司”列中每个列表中的元素,并仅当第一个数据框的日期在第二个数据框的日期之后才将其与我的第二个数据框中的公司名称匹配。我想要两列用于名称匹配,两列用于返回日期匹配。
df = pd.DataFrame(columns=['Customer','Companies', 'Date'])
df = df.append({'Customer':'Gold', 'Companies':['Gold Ltd', 'Gold X', 'Gold De'], 'Date':'2019-01-07'}, ignore_index=True)
df = df.append({'Customer':'Micro', 'Companies':['Microf', 'Micro Inc', 'Micre'], 'Date':'2019-02-10'}, ignore_index=True)
Customer Companies Date
0 Gold [Gold Ltd, Gold X, Gold De] 2019-01-07
1 Micro [Microf, Micro Inc, Micre] 2019-02-10
df2 = pd.DataFrame(columns=['Companies', 'Date'])
df2 = df2.append({'Companies':'Gold Ltd', 'Date':'2019-01-01'}, ignore_index=True)
df2 = df2.append({'Companies':'Gold X', 'Date':'2020-01-07'}, ignore_index=True)
df2 = df2.append({'Companies': 'Gold De', 'Date':'2018-07-07'}, ignore_index=True)
df2 = df2.append({'Companies':'Microf', 'Date':'2019-02-18'}, ignore_index=True)
df2 = df2.append({'Companies':'Micro Inc', 'Date':'2017-09-27'}, ignore_index=True)
df2 = df2.append({'Companies':'Micre', 'Date':'2018-12-11'}, ignore_index=True)
Companies Date
0 Gold Ltd 2019-01-01
1 Gold X 2020-01-07
2 Gold De 2018-07-07
3 Microf 2019-02-18
4 Micro Inc 2017-09-27
5 Micre 2018-12-11
def match_it(d1, d2):
for companies in d1['Companies']:
for company in companies:
if d2['Companies'].str.contains(company).any():
mask = d1.Companies.apply(lambda x: company in x)
dff = d1[mask]
date1 = datetime.strptime(dff['Date'].values[0], '%Y-%m-%d').date()
date2 = datetime.strptime(d2[d2['Companies']==company]['Date'].values[0], '%Y-%m-%d').date()
if date2 < date1:
print(d2[d2['Companies']==company])
new_row = pd.Series([d2[d2['Companies']==company]['Date'], d2[d2['Companies']==company]['Companies']])
return new_row
期望的输出:
Customer Companies Date Name_1 Date_1 Name_2 Date_2
Gold [Gold Ltd, Gold X, Gold De] 2019-01-07 Gold Ltd 2019-01-01 Gold De 2018-07-07
Micro [Microf, Micro Inc, Micre] 2019-02-10 Micro Inc 2017-09-27 Micre 2018-12-11
【问题讨论】:
-
据我了解
Companies列表可能有不同的长度。在这种情况下应该是什么?您将拥有不同数量的列(Name_X) -
我认为这个问题与您的问题非常相似:stackoverflow.com/questions/53837685/… 如果
df2.Companies in df.Companies,您正在寻找合并数据框。如果日期不在第二个日期之后,您将有一些额外的逻辑来删除输出 df 中的列。