【问题标题】:Python File Reduction - part 2Python 文件缩减 - 第 2 部分
【发布时间】:2013-03-27 11:09:18
【问题描述】:

这是对我之前的问题 (text file reduction with randomization in Python) 的跟进。正在修改以运行多个归约,但只有第一个输出文件包含归约,以下 3 个文件是零大小。这一定是我没有看到的明显的东西......

#!/usr/bin/env python
import random
import sys
from itertools import chain, groupby

def choose_random(iterator, fraction, random=random.random):
    """Lazy analog of:

        L = list(iterator)
        k = int(len(L) * fraction + .5) or 1 # get at least one
        result = random.sample(L, k)

    Note: this function doesn't randomize the order of elements
          that would require to keep selected elements in memory
          and number of output elements is not exactly k
    """
    # always yield at least one item if input is not empty
    item = next(iterator)
    it = (x for x in chain([item], iterator) if random() < fraction)
    for x in chain([next(it, item)], it):
        yield x

def getkey(line):
    return line.split("\t")[3] # 4th column

reductions = [0.25, 0.50, 0.75, 1]
filename = "foo"
outfile = [open("-".join([x, filename]), "w") for x in map(str, reductions)]

try:
    with open(filename, "r") as f:
        for ln, k in enumerate(map(float, reductions)):
            for key, group in groupby(f, key=getkey):
                outfile[ln].writelines(choose_random(group, fraction=k))

finally:
    for f in outfile:
        f.close()

输出显示如下(文件 0.25-foo 包含正确的归约,其余为空):

-rw-r--r--  1 staff  staff   53326048 Mar 27 03:42 0.25-foo
-rw-r--r--  1 staff  staff          0 Mar 27 03:42 0.5-foo
-rw-r--r--  1 staff  staff          0 Mar 27 03:42 0.75-foo
-rw-r--r--  1 staff  staff          0 Mar 27 03:42 1-foo

【问题讨论】:

    标签: python parsing text random reduction


    【解决方案1】:

    您打开foo 一次,但尝试对其进行四次迭代。在第一次缩减结束时,您处于文件的末尾。要么重新打开它:

    try:
        for ln, k in enumerate(map(float, reductions)):
            with open(filename, "r") as f:
                for key, group in groupby(f, key=getkey):
                    outfile[ln].writelines(choose_random(group, fraction=k))
    
    finally:
        for f in outfile:
            f.close()
    

    或每次缩减后倒带:

    try:
        with open(filename, "r") as f:
            for ln, k in enumerate(map(float, reductions)):
                for key, group in groupby(f, key=getkey):
                    outfile[ln].writelines(choose_random(group, fraction=k))
                f.seek(0)
    
    finally:
        for f in outfile:
            f.close()
    

    我会同时打开两个文件:

    reductions = [0.25, 0.50, 0.75, 1.0]
    filename = "foo"
    
    for fraction in reductions:
        with open(filename, "r") as f, open('%s-%s' % (fraction, filename), 'w') as outfile:
            for key, group in groupby(f, key=getkey):
                outfile.writelines(choose_random(group, fraction=fraction))
    

    【讨论】:

    • 非常感谢!这是完美的......只是很累......这里几乎是凌晨 4:30。 ;)
    • ... 或者您可以将文件读入内存并重复使用内存中的副本。
    猜你喜欢
    • 1970-01-01
    • 2017-12-11
    • 1970-01-01
    • 1970-01-01
    • 2019-10-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-03-16
    相关资源
    最近更新 更多