【问题标题】:Outer merge on a dataframes within a dictionary key字典键中数据帧的外部合并
【发布时间】:2019-09-05 19:59:19
【问题描述】:

我是 python 新手,一直在网上搜索这个问题的解决方案,但没有找到任何解决方案。我有一个熊猫数据框字典,其中键是“年份”,值是那一年的熊猫数据框。这是示例数据:

import pandas as pd
import numpy as np
from collections import defaultdict

##Creating Dataframes
data1_2018 =[[1,2018,80], [2,2018,70]]
data2_2018 = [[1,2018,77], [3,2018,62]]
data3_2018 = [[1,2018,82], [2,2018,88], [4,2018,66]]

data1_2017 = [[1,2017,80], [5,2017,70]]
data2_2017 = [[1,2017,77], [3,2017,62]]
data3_2017 = [[1,2017,50], [2,2017,52], [4,2017,51]]

df1_2018 = pd.DataFrame(data1_2018, columns = ['ID', 'Year', 'Score_1'])
df2_2018 = pd.DataFrame(data2_2018, columns = ['ID', 'Year', 'Score_2'])
df3_2018 = pd.DataFrame(data3_2018, columns = ['ID', 'Year', 'Score_3'])


df1_2017 = pd.DataFrame(data1_2017, columns = ['ID', 'Year', 'Score_1'])
df2_2017 = pd.DataFrame(data2_2017, columns = ['ID', 'Year', 'Score_2'])
df3_2017 = pd.DataFrame(data3_2017, columns = ['ID', 'Year', 'Score_3'])

###Creating list of all dataframes
all_df_list = [df1_2018,df2_2018,df3_2018,df1_2017,df2_2017,df3_2017]

我选择从包含所有数据框的列表开始,因为这是在我的实际问题中导入数据的方式。获得数据框列表后,我创建了这些数据框的字典。

yearly_dfs = defaultdict(list)
####Loop for creating dict with keys being years and values being dfs for that year
for df in all_df_list:
    for yr, yr_df in df.groupby('Year'):
        yearly_dfs[yr].append(yr_df)

现在,我的问题是.. 您能否遍历每个组的数据框并将它们与“ID”的外部合并合并在一起。所需的输出将是一个列表或字典,每年只有一个数据帧。以下是每年的预期结果:

desired_output_2018 = df1_2018.merge(df2_2018, how = 'outer', on = ['ID', 'Year']).merge(df3_2018, how = 'outer', on = ['ID', 'Year']) 
desired_output_2017 = df1_2017.merge(df2_2017, how = 'outer', on = ['ID', 'Year']).merge(df3_2017, how = 'outer', on = ['ID', 'Year'])

print(desired_output_2018)
   ID  Year  Score_1  Score_2  Score_3
0   1  2018     80.0     77.0     82.0
1   2  2018     70.0      NaN     88.0
2   3  2018      NaN     62.0      NaN
3   4  2018      NaN      NaN     66.0

print(desired_output_2017)
   ID  Year  Score_1  Score_2  Score_3
0   1  2017     80.0     77.0     50.0
1   5  2017     70.0      NaN      NaN
2   3  2017      NaN     62.0      NaN
3   2  2017      NaN      NaN     52.0
4   4  2017      NaN      NaN     51.0

任何帮助将不胜感激!

谢谢!

【问题讨论】:

    标签: python pandas loops dictionary merge


    【解决方案1】:

    使用 pandas.concatDataFrame.groupby 'Year' & 'ID',以及 agg 函数 first,然后在 dict comprehensiongrouby 'Year' 中使用:

    df_all = (pd.concat(all_df_list, sort=False)
              .groupby(['ID', 'Year']).first().reset_index())
    
    df_years = {yr: df for yr, df in df_all.groupby('Year')}
    

    访问方式:

    df_years[2017]
    
       ID  Year  Score_1  Score_2  Score_3
    0   1  2017     80.0     77.0     50.0
    2   2  2017      NaN      NaN     52.0
    4   3  2017      NaN     62.0      NaN
    6   4  2017      NaN      NaN     51.0
    8   5  2017     70.0      NaN      NaN
    
    df_years[2018]
    
      ID  Year  Score_1  Score_2  Score_3
    1   1  2018     80.0     77.0     82.0
    3   2  2018     70.0      NaN     88.0
    5   3  2018      NaN     62.0      NaN
    7   4  2018      NaN      NaN     66.0
    

    【讨论】:

    • 感谢您的回复。由于某种原因,当我运行代码时出现错误:TypeError: cannot concatenate object of type "";只有 pd.Series、pd.DataFrame 和 pd.Panel(已弃用)obj 有效
    • all_df_list 中的DataFrames 中必须有一个listprint([type(x) for x in all_df_list]) 是什么?看看你能不能找到有问题的对象
    • [x for x in all_df_list if isinstance(x, list)] 也可能有助于追踪它..?
    • 谢谢!这是我自己的用户错误,您的代码运行良好!
    猜你喜欢
    • 2021-12-27
    • 2018-02-27
    • 2021-12-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-06-16
    • 1970-01-01
    • 2022-01-15
    相关资源
    最近更新 更多