【问题标题】:Rearranging Data in CSV by Certain Data in a Column按列中的某些数据重新排列 CSV 中的数据
【发布时间】:2013-05-19 12:34:24
【问题描述】:

我有一个 CSV 文件,其中包含大约 30,000 行数据和 24 列。最后一列是地理列,如下所示:

 Ethiopia
 IL
 IL
 TX
 TX
 MD
 NY
 NY
 Ethiopia
 Ethiopia
 Sweden
 CA
 CA
 HI
 Latvia
 OH

现在我只希望包含所有行的整个 CSV 与美国的地理位置相对应,这将是 2 个字符的州缩写(CA、HI、OH 等)

基本上,我希望 CSV 中的所有数据都删除任何与美国无关的数据,或者如果可能的话甚至更好,按美国的位置排列前 X 行,其余的按 CSV 末尾的所有其他数据排列。

到目前为止,这是我的代码:

import csv

ask = "Y"

while ask != "N":
    inputfile = input("Please enter filename: ")
    filename = open(inputfile, "r")

    data = []
    with filename as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:
            if len(row[24]) == 3:
                data = row[24]
        datalist = row[0:23].join(data)
        output = open("Newly Created Data.csv","w")
        output.write(datalist)
        print ("Done.")

    output.close()

    ask = input("Another file, Y or N? ")

它通过仅读取美国位置正确排列第 24 列中的数据,但我不知道如何对文件的其余部分和其他 23 列进行排序以仅与美国位置匹配。

我正在使用 Python 3,谢谢。

【问题讨论】:

  • 所以您想删除所有内容与 CA、HI、OH 等任何缩写不匹配的行(美国各州?)
  • 正确,或者最好将美国位置排序在 CSV 的顶部,其余位置在底部。

标签: python csv


【解决方案1】:

对于纯粹的标准库解决方案,可能类似于

import csv

with open('location.csv', newline='') as fp_in:
    reader = csv.reader(fp_in, delimiter=',')
    data = list(reader)

data.sort(key=lambda x: (len(x[-1].strip()) != 2, x[-1].strip()))

with open("locout.csv", "w", newline='') as fp_out:
    writer = csv.writer(fp_out, delimiter=',')
    writer.writerows(data)

排序键功能,lambda x: (len(x[-1].strip()) != 2, x[-1].strip())),意味着它将首先根据最后一列是否有两个字符对数据进行排序,首先放置两个字符的位置,然后是名称(有效地按字母顺序排列它们,至少如果它们都以大写字母开头。)

我假设文件不是太大:30000 行并不是很多,即使有 24 列,所以我们不妨完全在内存中工作。

(顺便说一句:如果您正在执行大量 CSV 操作,您可能会对 pandas 库感兴趣——它使很多操作比其他方式简单得多。)

【讨论】:

  • 这很有效,谢谢。排序是否也适用于比较诸如 x[-1].strip() != "2012" 之类的字符串?
【解决方案2】:
import csv
states = set(["AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY",])

with open('file.txt') as f, open('ofile.txt','w+') as o:
    reader = csv.reader(f)
    writer = csv.writer(o)
    writer.writerows(sorted(reader,key=lambda row: not row[-1] in states))

会像这样对文件进行排序

A,B,C,Ethiopia
A,B,C,IL
A,B,C,IL
A,B,C,TX
A,B,C,TX
A,B,C,MD
A,B,C,NY
A,B,C,NY
A,B,C,Ethiopia
A,B,C,Ethiopia
A,B,C,Sweden
A,B,C,CA
A,B,C,CA
A,B,C,HI
A,B,C,Latvia
A,B,C,OH

进入

A,B,C,IL

A,B,C,IL

A,B,C,TX

A,B,C,TX

A,B,C,MD

A,B,C,NY

A,B,C,NY

A,B,C,CA

A,B,C,CA

A,B,C,HI

A,B,C,OH

A,B,C,Ethiopia

A,B,C,Ethiopia

A,B,C,Ethiopia

A,B,C,Sweden

A,B,C,Latvia

当这样回读时:

with open('ofile.txt') as f:
    for line in csv.reader(f):
        print(line)

生产:

>>> 
['A', 'B', 'C', 'IL']
['A', 'B', 'C', 'IL']
['A', 'B', 'C', 'TX']
['A', 'B', 'C', 'TX']
['A', 'B', 'C', 'MD']
['A', 'B', 'C', 'NY']
['A', 'B', 'C', 'NY']
['A', 'B', 'C', 'CA']
['A', 'B', 'C', 'CA']
['A', 'B', 'C', 'HI']
['A', 'B', 'C', 'OH']
['A', 'B', 'C', 'Ethiopia']
['A', 'B', 'C', 'Ethiopia']
['A', 'B', 'C', 'Ethiopia']
['A', 'B', 'C', 'Sweden']
['A', 'B', 'C', 'Latvia']

【讨论】:

  • 抱歉回复晚了。嗯,我不确定这是否适用于我的文件,它运行良好但没有排序。这是我正在使用的文件:uploadmb.com/dw.php?id=1369074562
猜你喜欢
  • 1970-01-01
  • 2021-12-28
  • 1970-01-01
  • 2021-12-07
  • 2018-11-27
  • 2014-02-02
  • 1970-01-01
  • 2017-07-31
  • 2014-11-01
相关资源
最近更新 更多