【问题标题】:Pandas merge two data frame with different length熊猫合并两个不同长度的数据框
【发布时间】:2016-12-10 04:25:23
【问题描述】:

我有两个长度不同的数据框,一个是关于人口的,另一个是关于死亡的。我需要合并它们。这是人口表的结构...

Year     Age      Female         Male        Total
1933       0     984472.26   1015361.55   1999833.81
1933       1    1040496.02   1064088.29   2104584.31
1933       2    1093043.81   1117527.14   2210570.95
1933       3    1107994.31   1135046.59   2243040.90
1933       4    1130624.43   1179513.62   2310138.05
1933       5    1168930.56   1228225.14   2397155.70
1933       6    1190706.56   1238800.33   2429506.89
1933       7    1203816.58   1245575.51   2449392.09
1933       8    1224285.20   1255721.28   2480006.48
1933       9    1230968.73   1254639.67   2485608.40
1933      10    1243608.10   1262739.94   2506348.04

死亡表的结构与人口相同,但值不同。如果您注意到每一行的年龄递增。人口表的行数多于死亡表。合并两张表后,我希望在死亡行中有NaN's。但是,在运行代码合并表后,我得到以下输出...

  year,p_age,p_female,p_male,p_total,d_age,d_female,d_male,d_total
0,1933,0,984472.26,1015361.55,1999833.81,0,52615.77,68438.11,121053.88
1,1933,0,984472.26,1015361.55,1999833.81,1,8917.13,10329.16,19246.29
2,1933,0,984472.26,1015361.55,1999833.81,2,4336.92,5140.05,9476.97
3,1933,0,984472.26,1015361.55,1999833.81,3,3161.59,3759.88,6921.47
4,1933,0,984472.26,1015361.55,1999833.81,4,2493.84,2932.59,5426.43
5,1933,0,984472.26,1015361.55,1999833.81,5,2139.87,2537.53,4677.4
6,1933,0,984472.26,1015361.55,1999833.81,6,1939.7,2337.76,4277.46
7,1933,0,984472.26,1015361.55,1999833.81,7,1760.47,2163.9,3924.37
8,1933,0,984472.26,1015361.55,1999833.81,8,1602.2,2015.97,3618.17
9,1933,0,984472.26,1015361.55,1999833.81,9,1464.88,1893.96,3358.84
10,1933,0,984472.26,1015361.55,1999833.81,10,1357.91,1805.52,3163.43

如果您注意到年龄正在重复,并且数据框从 900 多到 100 万。这是我正在使用的合并代码...

df_usa = usa_population.merge(usa_death, how='left', on='year')

我也用过……

df_usa = pd.merge(usa_population, usa_death, how='left', on='year')

或者...

df_usa = pd.merge(usa_population, usa_death, how='inner', on='year')

如何修复此代码?

【问题讨论】:

    标签: python-3.x pandas merge


    【解决方案1】:

    您似乎也想在age 列上合并。试试这个:

    df_usa = usa_population.merge(usa_death, how='left', on=['year','age'])
    

    【讨论】:

      猜你喜欢
      • 2021-04-27
      • 2018-10-02
      • 2016-05-05
      • 2016-08-20
      • 2017-06-11
      • 2016-01-01
      • 2017-09-02
      相关资源
      最近更新 更多