【发布时间】:2016-06-18 07:13:14
【问题描述】:
我有一个这样的数据文件:
# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.531000 0.0618000 0.938200
14.2000 0.532000 0.0790500 0.920950
14.2000 0.533000 0.0998900 0.900110
# it has lots of other lines
# datafile can be obtained from pastebin
输入数据文件的链接是: http://pastebin.com/NaNbEm3E
我喜欢从这个输入创建 20 个文件,这样每个文件都有 cmets 行。
那是:
#out1.txt
#comments
first part of one-twentieth data
# out2.txt
# given comments
second part of one-twentieth data
# and so on upto out20.txt
我们如何在 python 中做到这一点?
我最初的尝试是这样的:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author : Bhishan Poudel
# Date : May 23, 2016
# Imports
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# read in comments from the file
infile = 'filecopy_multiple.txt'
outfile = 'comments.txt'
comments = []
with open(infile, 'r') as fi, open (outfile, 'a') as fo:
for line in fi.readlines():
if line.startswith('#'):
comments.append(line)
print(line)
fo.write(line)
#==============================================================================
# read in a file
#
infile = infile
colnames = ['angle', 'wave','trans','refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
print('{} {} {} {}'.format('length of df : ', len(df),'',''))
# write 20 files
df = df
nfiles = 20
nrows = int(len(df)/nfiles)
groups = df.groupby( np.arange(len(df.index)) / nrows )
for (frameno, frame) in groups:
frame.to_csv("output_%s.csv" % frameno,index=None, header=None,sep='\t')
到目前为止,我有 20 个拆分文件。我只想将 cmets 行复制到每个文件中。但问题是:how to do so?
应该有一些比仅使用 cmets 创建另外 20 个输出文件并将 20_splitted_files 附加到它们更简单的方法。
一些有用的链接如下:
How to split a dataframe column into multiple columns
How to split a DataFrame column in python
Split a large pandas dataframe
【问题讨论】:
-
不太清楚为什么在这种情况下需要 pandas/数据帧...您是要保留现有文件格式还是要将拆分后的文件保存为普通 CSV 或 HDF5 文件?
-
@MaxU 我想将分割后的文件保存为普通的 CSV 文件,这样每 20 个输出文件的头文件与输入文件的头文件相同。
-
您的原始 CSV 文件是否适合 RAM 或您必须逐行读取?
-
@MaxU 我的原始 CSV 文件适合 RAM,它不是很大的文件。