【问题标题】:Split CSV File into two files keeping header in both files将 CSV 文件拆分为两个文件,在两个文件中保留标题
【发布时间】:2021-10-14 00:00:47
【问题描述】:

我正在尝试将一个大型 CSV 文件拆分为两个文件。我正在使用以下代码

import pandas as pd

#csv file name to be read in
in_csv = 'Master_file.csv'

#get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))

#size of rows of data to write to the csv,

#you can change the row size according to your need
rowsize = 600000

#start looping through data writing it to a new file for each set
for i in range(0,number_lines,rowsize):

    df = pd.read_csv(in_csv,
          nrows = rowsize,#number of rows to read at each loop
          skiprows = i)#skip rows that have been read

    #csv to write data to a new file with indexed name. input_1.csv etc.
    out_csv = 'File_Number' + str(i) + '.csv'

    df.to_csv(out_csv,
          index=False,
          header=True,
          mode='a',#append data to csv file
          chunksize=rowsize)#size of data to append for each loop

它正在拆分文件,但它在第二个文件中缺少标题。我该如何解决它

【问题讨论】:

    标签: python-3.x csv


    【解决方案1】:

    .read_csv()chunksize 一起使用时返回一个迭代器,然后跟踪标头。下面是一个例子。这应该快得多,因为上面的原始代码读取整个文件以计算行数,然后在每次块迭代中重新读取所有先前的行;而下面的文件只读取一次:

    import pandas as pd
    
    with pd.read_csv('Master_file.csv', chunksize=60000) as reader:
        for i,chunk in enumerate(reader):
            chunk.to_csv(f'File_Number{i}.csv', index=False, header=True)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-06-22
      • 2018-12-27
      • 2019-11-17
      • 2016-09-20
      • 2014-08-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多