【发布时间】:2016-03-31 09:27:22
【问题描述】:
我有两个具有多级索引 r1 和 r2 的数据帧,这样
a1=['iso3_o', 'iso3_d', 'year', 'ExportFoodAndLiveAnimals']
a=np.array([['CAN', 'USA', '1995.0', '5918210.506'],
['CAN', 'USA', '1996.0', '6988508.727'],
['CAN', 'USA', '1997.0', '7792977.258'],
['CAN', 'USA', '1998.0', '8177456.631'],
['CAN', 'USA', '1999.0', '8773990.755'],
['CAN', 'USA', '2000.0', '9650783.071'],
['CAN', 'USA', '2001.0', '10800432.88'],
['CAN', 'USA', '2002.0', '11348837.38'],
['CAN', 'USA', '2003.0', '11313334.46'],
['CAN', 'USA', '2004.0', '12337588.35'],
['CAN', 'USA', '2005.0', '13227226.96'],
['CAN', 'USA', '2006.0', '14236699.34'],
['CAN', 'USA', '2007.0', '15638919.3'],
['CAN', 'USA', '2008.0', '17449901.08'],
['CAN', 'USA', '2009.0', '14813089.89'],
['CAN', 'USA', '2010.0', '16399733.82']])
r1 = pd.DataFrame(a, columns=a1)
r1
而r2被定义为
a1=['iso3_o', 'iso3_d', 'year', 'contig']
a=np.array([['CAN', 'USA', 1995, 1],
['CAN', 'USA', 1996, 1],
['CAN', 'USA', 1997, 1],
['CAN', 'USA', 1998, 1],
['CAN', 'USA', 1999, 1],
['CAN', 'USA', 2000, 1],
['CAN', 'USA', 2001, 1],
['CAN', 'USA', 2002, 1],
['CAN', 'USA', 2003, 1],
['CAN', 'USA', 2004, 1],
['CAN', 'USA', 2005, 1],
['CAN', 'USA', 2006, 1],
['CAN', 'USA', 2007, 1],
['CAN', 'USA', 2008, 1],
['CAN', 'USA', 2009, 1],
['CAN', 'USA', 2010, 1]])
r2 = pd.DataFrame(a, columns=a1)
r2
然后我决定加入他们的多索引级别
因此,我所做的就是将列重置为索引
multi_r2 = r2.set_index(['iso3_o', 'iso3_d','year'])
multi_r1 = r1.set_index(['iso3_o', 'iso3_d','year'])
df = multi_r2.join(multi_r1)
当我加入 'iso3_o'、'iso3_d'、'year' 时,DataFrame df 给了我一个 NAN
为什么会这样?
提前谢谢你
【问题讨论】:
-
为什么您的年份数据类型不一致?一个是str,另一个是int?
-
此外,这应该有效我可以重现您的错误,如果您执行
r2.combine_first(r1)然后设置索引然后它应该有效,您的熊猫版本是什么,因为这对我来说看起来像一个错误, 我的是 0.18.0
标签: python join pandas merge dataframe