Python中的CSV文件处理答案

【问题标题】：CSV file processing in PythonPython中的CSV文件处理
【发布时间】：2013-04-22 22:24:08
【问题描述】：

我使用以以下格式输出到文本文件的空间数据：

COMPANY NAME
P.O. BOX 999999
ZIP CODE , CITY 
+99 999 9999
23 April 2013 09:27:55

PROJECT: Link Ref
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Design DTM is 30MB 2.5X2.5
Stripping applied to design is 0.000

Point Number      Easting     Northing        R.L. Design R.L.  Difference  Tol  Name
     3224808   422092.700  6096059.380       2.520     -19.066     -21.586  --   
     3224809   422092.200  6096059.030       2.510     -19.065     -21.575  --   
<Remainder of lines>
 3273093   422698.920  6096372.550       1.240     -20.057     -21.297  --   

Average height difference is -21.390
RMS  is  21.596
0.00 % above tolerance
98.37 % below tolerance
End of Report

如图所示，文件有页眉和页脚。数据由空格分隔，但列之间的数量不相等。

我需要的是带有东移、北移和差异的逗号分隔文件。

我不想手动修改数百个大文件，我正在编写一个小脚本来处理这些文件。这是我目前所拥有的：

#! /usr/bin/env python
import csv,glob,os
from itertools import islice
list_of_files = glob.glob('C:/test/*.txt')
for filename in list_of_files:
(short_filename, extension )= os.path.splitext(filename)
print short_filename
file_out_name = short_filename + '_ed' + extension
with open (filename, 'rb') as source:
    reader = csv.reader( source) 
    for row in islice(reader, 10, None):
        file_out= open (file_out_name, 'wb')
        writer= csv.writer(file_out)
        writer.writerows(reader)
        print 'Created file: '+ file_out_name
        file_out.close()
print 'All done!'

问题：

如何让以“点号”开头的行成为输出文件的标题？我正在尝试将 DictReader 代替阅读器/编写器位，但无法使其正常工作。
使用分隔符 ',' 写入输出文件确实有效，但会在每个空格处写入一个逗号，导致输出文件中的空列过多。我该如何规避这种情况？
如何删除页脚？

【问题讨论】：

您是否可以通过点击答案左侧的大勾框来奖励为您提供最佳建议的用户？

标签： python file csv

【解决方案1】：

我发现您的代码有问题，您正在为每一行创建一个新的writer；所以你只会得到最后一个。

您的代码可能是这样的，不需要 CSV 读取器或写入器，因为它足够简单，可以被解析为简单文本（如果您有文本列、带有转义字符等，就会出现问题）。

def process_file(source, dest):
  found_header = False
  for line in source:
    line = line.strip()
    if not header_found:
      #ignore everything until we find this text
      header_found = line.starswith('Point Number')
    elif not line:
      return #we are done when we find an empty line, I guess
    else:
      #write the needed columns
      columns = line.split()
      dest.writeline(','.join(columns[i] for i in (1, 2, 5)))

for filename in list_of_files:
  short_filename, extension = os.path.splitext(filename)
  file_out_name = short_filename + '_ed' + extension
  with open(filename, 'r') as source:
    with open(file_out_name. 'w') as dest:
      process_file(source, dest)

【讨论】：

感谢您的快速回答。
使用elif 而不是else if。
+1 表示转义字符，这可能是内置函数难以处理的问题。

【解决方案2】：

这行得通：

#! /usr/bin/env python

import glob,os

list_of_files = glob.glob('C:/test/*.txt')

def process_file(source, dest):
  header_found = False
  for line in source:
    line = line.strip()
    if not header_found:
      #ignore everything until we find this text
      header_found = line.startswith('Stripping applied') #otherwise, header is lost
    elif not line:
      return #we are done when we find an empty line
    else:
      #write the needed columns
      columns = line.split()
      dest.writelines(','.join(columns[i] for i in (1, 2, 5))+"\n") #newline character adding was necessary

for filename in list_of_files:
  short_filename, extension = os.path.splitext(filename)
  file_out_name = short_filename + '_ed' + ".csv"
  with open(filename, 'r') as source:
    with open(file_out_name, 'wb') as dest:
      process_file(source, dest)

【讨论】：

【解决方案3】：

回答您的第一个和最后一个问题：这只是忽略相应的行，即不将它们写入输出。这对应于if not header_found 和else if not line: fortran 提案块。

第二点是你的文件中没有专用的分隔符：你有一个或多个空格，这使得使用csv 模块很难解析。使用split() 将解析每一行并返回非空白字符列表，因此只会返回有用的值。

【讨论】：