很简单:制作一个包含记录列表的字典,以给定列的内容为键(这里我使用了第 0 列),然后在这些上循环
根据 OP 指定的简单规则,列出并输出两个输出文件之一的记录列表中包含的每条记录。
from csv import reader, writer
inp = reader(open(...))
outs = [writer(open(fnm, 'w') for fnm in ('f30', 'f70')]
column, d = 0, {}
for rec in inp0:
d.setdefault(rec[column], []).append(rec)
for recs in d.values():
l = round(0.7*len(recs))
for n, rec in enumerate(recs):
outs[n<l].writerow(rec)
布尔值是整数的子类,其值为 1(当 n<l 时)或 0,可用于索引 writer s 的列表。
这里是这个方法的检查,使用 IPython 会话(略
编辑以减少空白)和一些人工数据
17:22:~ $ ipython
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from csv import reader, writer
...: from random import randrange, seed
...: seed(20190712)
In [2]: data = [','.join(str(randrange(10)) for _ in range(4)) for _ in range(200)]
In [3]: inf = reader(data)
In [4]: of1 = writer(open('dele1', 'w')); of2 = writer(open('dele2', 'w'))
In [5]: d = {}
In [6]: for record in inf:
...: d.setdefault(record[0], []).append(record)
...: for key, records in d.items():
...: l1 = round(0.7*len(records))
...: for n, record in enumerate(records):
...: (of1 if n<l1 else of2).writerow(records)
In [7]: Ctrl-D
Do you really want to exit ([y]/n)?
17:23:~ $ wc -l dele?
140 dele1
60 dele2
200 total
17:24:~ $ rm dele?
17:24:~ $
如您所见,第一个文件获取了 70% 的原始记录,而
第二个获得剩余的 30%。