【发布时间】:2021-03-10 11:14:39
【问题描述】:
我编写了一个简单的脚本,它应该合并(联合)一些数据帧并删除重复项。
例如, 对于输入:
df_A:
a 1
b 2
df_B:
b 2
c 3
预期的输出是:
df_out:
a 1
b 2
c 3
我写了以下代码:
def read_dataframes(filenames, basedir):
return [pd.read_csv(basedir + file, sep='\t', header=None, quoting=csv.QUOTE_NONE) for file in filenames]
def merge_dataframes(dfs, out):
merged = pd.concat(dfs).drop_duplicates(subset=[0, 1]).reset_index(drop=True)
merged = merged.iloc[:, [0, 1, 2, 7, 8, 9]]
merged.to_csv(out, header=None, index=None, sep='\t')
我以下列方式调用这些函数:
merge_dataframes(read_dataframes(filenames, basedir), output)
我遇到了KeyError 的异常:
Traceback (most recent call last):
File "analysis_and_visualization.py", line 70, in <module>
merge_dataframes(read_dataframes(wild_emb, wild_basedir), 'wild_emb_merged')
File "analysis_and_visualization.py", line 17, in merge_dataframes
merged = pd.concat(dfs).drop_duplicates(subset=[0, 1]).reset_index(drop=True)
File "/Data/user/eliran/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5112, in drop_duplicates
duplicated = self.duplicated(subset, keep=keep)
File "/Data/user/eliran/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5248, in duplicated
raise KeyError(diff)
KeyError: Int64Index([1], dtype='int64')
我做错了什么?
【问题讨论】:
-
一个想法 - 似乎第一列被转换为索引,为了防止它尝试使用
return [pd.read_csv(basedir + file, sep='\t', header=None, quoting=csv.QUOTE_NONE, index_col=False) for file in filenames]