【发布时间】:2018-12-03 19:04:08
【问题描述】:
我有多个 csv 文件,我根据目录中的名称将它们读入各个数据帧,就像这样
# ask user for path
path = input('Enter the path for the csv files: ')
os.chdir(path)
# loop over filenames and read into individual dataframes
for fname in os.listdir(path):
if fname.endswith('Demo.csv'):
demoRaw = pd.read_csv(fname, encoding = 'utf-8')
if fname.endswith('Key2.csv'):
keyRaw = pd.read_csv(fname, encoding = 'utf-8')
然后我过滤只保留某些列
# filter to keep desired columns only
demo = demoRaw.filter(['Key', 'Sex', 'Race', 'Age'], axis=1)
key = keyRaw.filter(['Key', 'Key', 'Age'], axis=1)
然后我创建上述数据框的列表并使用 reduce 将它们合并到 Key
# create list of data frames for combined sheet
dfs = [demo, key]
# merge the list of data frames on the Key
combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)
然后我删除自动生成的列,创建一个 Excel 编写器并写入 csv
# drop the auto generated index colulmn
combined.set_index('RecordKey', inplace=True)
# create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('final.xlsx', engine='xlsxwriter')
# write to csv
combined.to_excel(writer, sheet_name='Combined')
meds.to_excel(writer, sheet_name='Meds')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
问题是某些文件的键不在其他文件中。例如
演示文件
Key Sex Race Age
1 M W 52
2 F B 25
3 M L 78
密钥文件
Key Key2 Age
1 7325 52
2 4783 25
3 1367 78
4 9435 21
5 7247 65
现在,如果每个行中都有匹配的键,它只会包含行(换句话说,它只是忽略了键不在其他文件中的行)。即使键不匹配,如何组合所有文件中的所有行?所以最终的结果会是这样的
Key Sex Race Age Key2 Age
1 M W 52 7325 52
2 F B 25 4783 25
3 M L 78 1367 78
4 9435 21
5 7247 65
我不在乎空单元格是否为空白、NaN、#N/A 等。只要我能识别它们即可。
【问题讨论】:
-
替换
combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)为:combined=pd.merge(demo,key, how='outer', on='Key')您必须指定“外部”才能加入完整的 Key 和 Demo 表
标签: python-3.x pandas reduce