【问题标题】:parsing and combining csv files into another csv file in python3python3中将csv文件解析并组合成另一个csv文件
【发布时间】:2019-06-26 20:53:00
【问题描述】:

我有一个csv 文件列表,它们位于同一目录中,并尝试将这两个文件合并并创建一个新的csv 文件,其中包含两个输入文件的内容。这是 2 个输入文件的示例:

small_example1.csv

    CodeClass,Name,Accession,Count
    Endogenous,CCNO,NM_021147.4,18
    Endogenous,MYC,NM_002467.3,1114
    Endogenous,CD79A,NM_001783.3,178
    Endogenous,FSTL3,NM_005860.2,529

small_example2.csv

    CodeClass,Name,Accession,Count
    Endogenous,CCNO,NM_021147.4,196
    Endogenous,MYC,NM_002467.3,962
    Endogenous,CD79A,NM_001783.3,390
    Endogenous,FSTL3,NM_005860.2,67

这是预期的输出文件(result.csv):

    Probe_Name,Accession,Class_Name,small_example1,small_example2
    CCNO,NM_021147.4,Endogenous,18,196
    MYC,NM_002467.3,Endogenous,1114,962
    CD79A,NM_001783.3,Endogenous,178,390
    FSTL3,NM_005860.2,Endogenous,529,67

为此,我在python3中做了这个函数:

    import pandas as pd
    filenames = ['small_example1.csv', 'small_example2.csv']
    path = '/home/Joy'
    def convert(filenames):
        for file in filenames:
            df1 = pd.read_csv(file, skiprows=26, skipfooter=5, sep=',')
            df = df1.merge(df2, on=['CodeClass', 'Name', 'Accession'])
            df = df.rename(columns={'Name': 'Probe_Name',
                            'CodeClass': 'Class_Name',
                             file: file})
            df.to_csv('result.csv')

结果看起来像这样,最后 2 列与预期不同(headersnumbers)。

        Class_Name  Probe_Name  Accession   Count_x Count_y
    0   Endogenous  CCNO    NM_021147.4 18  18
    1   Endogenous  MYC NM_002467.3 1114    1114
    2   Endogenous  CD79A   NM_001783.3 178 178
    3   Endogenous  FSTL3   NM_005860.2 529 529

你知道如何解决这个问题吗?

【问题讨论】:

    标签: python-3.x pandas csv


    【解决方案1】:

    我建议您首先加载您的数据框并将它们存储在一个列表中,然后将它们全部合并在一起(根据您的需要使用内部或外部连接):

    import pandas as pd
    from functools import reduce
    
    filenames = ['small_example1.csv', 'small_example2.csv']
    path = '/home/Joy'
    
    def convert(filenames):
        dataframes = []
    
        # load all the dataframes in a list (dataframes)
        for filename in filenames:
            df = pd.read_csv(filename, skiprows=26, skipfooter=5, sep=',')
            df = df.rename(columns={'Count': filename})
            dataframes.append(df)
    
        # merge the dataframes
        df_merged = reduce(lambda x,y: pd.merge(x,y, on=['CodeClass', 'Name', 'Accession'], how='outer'), dataframes)
    
        # rename the columns as you want and export the result
        df_merged = df_merged.rename(columns={'Name': 'Probe_Name', 'CodeClass': 'Class_Name'})
        df_merged.to_csv('result.csv')
    

    【讨论】:

      【解决方案2】:

      这里有两个问题,标题和值。

      如果您获得两次相同的值,则表示您已读取两次相同的文件。您应该在加载时重命名 Count 列并将数据帧合并到最后一个:

      import pandas as pd
      filenames = ['small_example1.csv', 'small_example2.csv']
      path = '/home/Joy'
      def convert(filenames):
          df = None               # initialize the merged dataframe to None
          for file in d:
              # load a new dataframe and rename its Count columns
              df1 = pd.read_csv(io.StringIO(d[file])).rename(columns={'Count': file})
              # merge it into df
              if df is None:
                  df = df1
              else:
                  df = df.merge(df1, on=['CodeClass', 'Name', 'Accession'])
          # rename and reindex the columns
          result = df.rename(columns={'Name': 'Probe_Name', 'CodeClass': 'Class_Name'}
                             ).reindex(['Probe_Name','Accession','Class_Name']+filenames,
                                       axis=1)
          result.to_csv('result.csv', index=False)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-03-30
        • 2019-10-11
        • 1970-01-01
        • 1970-01-01
        • 2011-12-23
        • 1970-01-01
        • 2020-09-01
        相关资源
        最近更新 更多