【发布时间】:2020-06-22 11:51:03
【问题描述】:
我正在尝试使用从 API 收集的数据来丰富数据框。 所以,我会这样:
for i in df.index:
if pd.isnull(df.cnpj[i]) == True:
pass
else:
k=get_financials_hnwi(df.cnpj[i]) # this is my API requesting function, working fine
df=df.merge(k,on=["cnpj"],how="left") # here is my problem <-------------------------------
由于我在 for 语句中运行该合并,因此它显示了后缀 (_x, _y)。所以我在这里找到了这个替代方案:
Pandas: merge dataframes without creating new columns
for i in df.index:
if pd.isnull(df.cnpj[i]) == True:
pass
else:
k=get_financials_hnwi(df.cnpj[i]) # this is my requesting function, working fine
val = np.intersect1d(df.cnpj, k.cnpj)
df_temp = pd.concat([df,k], ignore_index=True)
df=df_temp[df_temp.cnpj.isin(val)]
但是它创建了一个新的 df,杀死了原来的索引并且不允许这条线运行if pd.isnull(df.cnpj[i]) == True:。
有没有一种很好的方法可以在 for 操作中运行合并/加入/连接而不用 _x 和 _y 创建新列?或者有一种方法可以混合 _x 和 _y 列,然后摆脱它并将其浓缩在一个列中?我只想要一个包含所有内容的列
示例数据和可重现的代码
df=pd.DataFrame({'cnpj':[12,32,54,65],'co_name':['Johns Market','T Bone Gril','Superstore','XYZ Tech']})
#first API request:
k=pd.DataFrame({'cnpj':[12],'average_revenues':[687],'years':['2019,2018,2017']})
df=df.merge(k,on="cnpj", how='left')
#second API request:
k=pd.DataFrame({'cnpj':[32],'average_revenues':[456],'years':['2019,2017']})
df=df.merge(k,on="cnpj", how='left')
#third API request:
k=pd.DataFrame({'cnpj':[53],'average_revenues':[None],'years':[None]})
df=df.merge(k,on="cnpj", how='left')
#fourth API request:
k=pd.DataFrame({'cnpj':[65],'average_revenues':[4142],'years':['2019,2018,2015,2013,2012']})
df=df.merge(k,on="cnpj", how='left')
print(df)
结果:
cnpj co_name average_revenues_x years_x average_revenues_y \
0 12 Johns Market 687.0 2019,2018,2017 NaN
1 32 T Bone Gril NaN NaN 456.0
2 54 Superstore NaN NaN NaN
3 65 XYZ Tech NaN NaN NaN
years_y average_revenues_x years_x average_revenues_y \
0 NaN None None NaN
1 2019,2017 None None NaN
2 NaN None None NaN
3 NaN None None 4142.0
years_y
0 NaN
1 NaN
2 NaN
3 2019,2018,2015,2013,2012
想要的结果:
cnpj co_name average_revenues years
0 12 Johns Market 687.0 2019,2018,2017
1 32 T Bone Gril 456.0 2019,2017
2 54 Superstore None None
3 65 XYZ Tech 4142.0 2019,2018,2015,2013,2012
【问题讨论】:
-
请添加一些示例数据和预期输出,以清楚地说明您的问题。通过阅读您的问题,我会将值保存在字典中,然后将它们映射到您的目标数据帧中。
-
好吧,我放一些样本数据
-
@Datanovice 因为我正在调用 API,所以对数据进行采样非常棘手。看起来不错,或者您会建议另一种采样方式?