根据Python中的值差异将列文本文件拆分为较小的文件答案

【问题标题】：Splitting column text file into smaller files based on value differences in Python根据Python中的值差异将列文本文件拆分为较小的文件
【发布时间】：2017-07-25 15:00:08
【问题描述】：

我正在尝试根据第一列中存在的值跳跃将具有 3 列的文本文件拆分为许多较小的单独文本文件。这是要拆分的文件的一小部分的示例：

2457062.30520078 1.00579146 1

2457062.30588184 1.00607543 1

2457062.30656300 1.00605515 1

2457062.71112193 1.00288150 1

2457062.71180299 1.00322454 1

2457062.71248415 1.00430136 1

在第 3 行和第 4 行之间有一个比平时更大的跳跃。这将是数据被拆分并且单独创建的文本文件被分开的点，前三行创建一个，后三行创建一个。第一列中的跳跃总是超过 0.1 的变化。目标是让任何像这个例子一样的跳转成为分隔文件的分割点。感谢您的任何见解，谢谢

【问题讨论】：

你能说得更具体点吗？？

标签： python list file text split

【解决方案1】：

只要满足您的条件，我就会遍历主文件并继续写行。这完全符合 while 循环的定义。这样做的主要复杂性是您同时需要两个打开的文件（主文件和您当前正在写入的文件），但这对 Python 来说不是问题。

MAINTEXT = "big_file.txt"
SFILE_TEMPL = 'small_file_{:03.0g}.txt'
# Delimiter is a space in the example you gave, but 
#  might be tab (\t) or comma or anything.
DELIMITER = ' ' 

LIM = .1

# i will count how many files we have created.
i = 0

# Open the main file
with open(MAINTEXT) as mainfile:
    # Read the first line and set up some things
    line = mainfile.readline()
    # Note that we want the first element ([0]) before
    #  the delimiter (.split(DELIMITER)) of the row (line)
    #  as a number (float)
    v_cur = float(line.split(DELIMITER)[0])
    v_prev = v_cur

    # This will stop the loop once we reach end of file (EOF)
    #  as readline() will then return an empty string.
    while line:
        # Open the second file for writing (mode='w').
        with open(SFILE_TEMPL.format(i), mode='w') as subfile:
            # As long as your values are in the limit, keep 
            #  writing lines to the current file.
            while line and abs(v_prev - v_cur)<LIM:
                subfile.write(line)
                line = mainfile.readline()
                v_prev = v_cur
                v_cur = float(line.split(DELIMITER)[0])
        # Increment the file counter
        i += 1
        # Make sure we don't get stuck after one file
        #  (If we don't replace v_prev here, the while loop
        #  will never execute after the first time.)
        v_prev = v_cur

【讨论】：

哇，这在创建第 20 个文件之前一直运行良好，直到我似乎在第 33 行为 v_cur 收到值错误，它无法将字符串转换为浮点数。知道是什么原因造成的吗？
我认为它实际上可能是文件的结尾，因为我刚刚意识到转换为浮点数发生在任何检查下一行是否为空之前。最干净的方法可能是移动 readline() 和 v_prev/v_cur 分配。或者，在 v_prev 和 v_cur 分配周围包含 if line:...else: break 或 try。在这种情况下，您还可以从上面的 while 语句中删除“line and”。

【解决方案2】：

假设你的文件是 test.txt 所以

f=open('test.txt').read().split('\n')
for i in f:
    frst_colmn,second_colmn,thrid_colmn = i.split('')

随着你阅读文件，但你到底想做什么？？？

【讨论】：

我们的目标是根据第一列中较大的跳跃值将一个大的 3 列文本文件分成单独的文本文件。问题是这种跳跃似乎是随机的。感谢您的帮助

【解决方案3】：

您可以在读取文件时检测到跳转

def reader(infile):
    number = float('-infinity')
    for line in infile:
        prev, number = number, float(line.split(' ', 1)[0])
        jump = number - prev >= 0.1
        yield jump, line

for jump, line in reader(infile):
    # jump is True if one must open a new output file
    ...

【讨论】：