基于具有不同行的列合并数据框答案

【问题标题】：Merge data frames based on column with different rows基于具有不同行的列合并数据框
【发布时间】：2018-12-03 19:04:08
【问题描述】：

我有多个 csv 文件，我根据目录中的名称将它们读入各个数据帧，就像这样

# ask user for path
path = input('Enter the path for the csv files: ')
os.chdir(path)

# loop over filenames and read into individual dataframes
for fname in os.listdir(path):
    if fname.endswith('Demo.csv'):
        demoRaw = pd.read_csv(fname, encoding = 'utf-8')
    if fname.endswith('Key2.csv'):
        keyRaw = pd.read_csv(fname, encoding = 'utf-8')

然后我过滤只保留某些列

# filter to keep desired columns only
demo = demoRaw.filter(['Key', 'Sex', 'Race', 'Age'], axis=1)
key = keyRaw.filter(['Key', 'Key', 'Age'], axis=1)

然后我创建上述数据框的列表并使用 reduce 将它们合并到 Key

# create list of data frames for combined sheet
dfs = [demo, key]

# merge the list of data frames on the Key
combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)

然后我删除自动生成的列，创建一个 Excel 编写器并写入 csv

# drop the auto generated index colulmn
combined.set_index('RecordKey', inplace=True)

# create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('final.xlsx', engine='xlsxwriter')

# write to csv
combined.to_excel(writer, sheet_name='Combined')
meds.to_excel(writer, sheet_name='Meds')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

问题是某些文件的键不在其他文件中。例如

演示文件

Key   Sex   Race   Age
1      M     W     52
2      F     B     25
3      M     L     78

密钥文件

Key   Key2   Age
1      7325     52
2      4783     25
3      1367     78
4      9435     21
5      7247     65

现在，如果每个行中都有匹配的键，它只会包含行（换句话说，它只是忽略了键不在其他文件中的行）。即使键不匹配，如何组合所有文件中的所有行？所以最终的结果会是这样的

Key   Sex   Race   Age   Key2   Age
 1      M     W     52    7325     52
 2      F     B     25    4783     25
 3      M     L     78    1367     78
 4                        9435     21
 5                        7247     65

我不在乎空单元格是否为空白、NaN、#N/A 等。只要我能识别它们即可。

【问题讨论】：

替换 combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs) 为：combined=pd.merge(demo,key, how='outer', on='Key') 您必须指定“外部”才能加入完整的 Key 和 Demo 表

标签： python-3.x pandas reduce

【解决方案1】：

将 combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs) 替换为：combined=pd.merge(demo,key, how='outer', on='Key') 您必须指定“外部”才能加入完整的 Key 和 Demo 表

【讨论】：

感谢@Bram van Hout。我按照您编写的方式使用了您的代码（不确定如何指定外部代码，但它运行良好。还看到了这篇文章stackoverflow.com/questions/46008957/pandas-merge-df-error，我合并了我的两个 dfs 并将它们与另一个合并的数据框合并。一切都很好. 再次感谢！