【发布时间】:2017-01-11 23:00:15
【问题描述】:
我有 5 个 csv 文件,我正在尝试使用 Python Pandas 进行合并,而且由于内存问题,我正在运行 64 位 Python。
所有 5 个 csv 文件都有相同的列名:
['A', 'B', 'C', ... 'Start_time', 'end_time', 'Unique_column']
这里 Unique_column 是每个 CSV 文件的不同列名。所以我需要将所有 5 个文件相互合并,所以最后我会得到 DataFrame as
['A', 'B', 'C', ... 'Start_time', 'end_time', 'Unique_column1', 'Unique_colum2', ... 'Unique_colum5']
是pandas.merge还是pandas.concat方法?
更新:
>>> import os
>>> import glob
>>> import numpy as np
>>> import pandas as pd
>>> dir_name = r'C:\Users\data'
>>> dfs = []
>>> files = glob.glob(os.path.join(dir_name, '*.csv'))
>>> for f in files:
... df = pd.read_csv(f)
... dfs.append(df)
...
>>> common_cols = ['Target', 'POS', 'Start_Week', 'End_Week', 'Measure_Metric']
>>> res = pd.concat([df.set_index(common_cols) for df in dfs], axis=1).reset_index()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Python27x64\lib\site-packages\pandas\tools\merge.py", line 846, in concat
return op.get_result()
File "c:\Python27x64\lib\site-packages\pandas\tools\merge.py", line 1031, in get_result
indexers[ax] = obj_labels.reindex(new_labels)[1]
File "c:\Python27x64\lib\site-packages\pandas\indexes\multi.py", line 1422, in reindex
raise Exception("cannot handle a non-unique multi-index!")
Exception: cannot handle a non-unique multi-index!
>>>
【问题讨论】: