【发布时间】:2019-06-26 20:53:00
【问题描述】:
我有一个csv 文件列表,它们位于同一目录中,并尝试将这两个文件合并并创建一个新的csv 文件,其中包含两个输入文件的内容。这是 2 个输入文件的示例:
small_example1.csv
CodeClass,Name,Accession,Count
Endogenous,CCNO,NM_021147.4,18
Endogenous,MYC,NM_002467.3,1114
Endogenous,CD79A,NM_001783.3,178
Endogenous,FSTL3,NM_005860.2,529
small_example2.csv
CodeClass,Name,Accession,Count
Endogenous,CCNO,NM_021147.4,196
Endogenous,MYC,NM_002467.3,962
Endogenous,CD79A,NM_001783.3,390
Endogenous,FSTL3,NM_005860.2,67
这是预期的输出文件(result.csv):
Probe_Name,Accession,Class_Name,small_example1,small_example2
CCNO,NM_021147.4,Endogenous,18,196
MYC,NM_002467.3,Endogenous,1114,962
CD79A,NM_001783.3,Endogenous,178,390
FSTL3,NM_005860.2,Endogenous,529,67
为此,我在python3中做了这个函数:
import pandas as pd
filenames = ['small_example1.csv', 'small_example2.csv']
path = '/home/Joy'
def convert(filenames):
for file in filenames:
df1 = pd.read_csv(file, skiprows=26, skipfooter=5, sep=',')
df = df1.merge(df2, on=['CodeClass', 'Name', 'Accession'])
df = df.rename(columns={'Name': 'Probe_Name',
'CodeClass': 'Class_Name',
file: file})
df.to_csv('result.csv')
结果看起来像这样,最后 2 列与预期不同(headers 和 numbers)。
Class_Name Probe_Name Accession Count_x Count_y
0 Endogenous CCNO NM_021147.4 18 18
1 Endogenous MYC NM_002467.3 1114 1114
2 Endogenous CD79A NM_001783.3 178 178
3 Endogenous FSTL3 NM_005860.2 529 529
你知道如何解决这个问题吗?
【问题讨论】:
标签: python-3.x pandas csv