使用python从文本文件中提取行答案

【问题标题】：Extract lines from text files using python使用python从文本文件中提取行
【发布时间】：2017-02-18 07:35:06
【问题描述】：

我有超过 100 个 .out 文件，它们是来自名为 MPlus 的统计软件的输出文件。在每个文件中（可以用任何文本编辑器打开），在几百行文本中，有几行我感兴趣。这些行看起来像这样 ->

 I        ON
    K1                -0.247      0.321     -0.769      0.442
    K2                 0.161      0.232      0.696      0.486

 S        ON
    K1                 0.035      0.143      0.247      0.805
    K2                -0.123      0.154     -0.799      0.424

 Q        ON
    K1                 0.083      0.325      0.255      0.798
    K2                 0.039      0.229      0.169      0.866

 I        ON
    LABTOTF1           0.014      0.018      0.787      0.431
    LABTOTG2           0.011      0.017      0.626      0.532
    UGLABTOT           0.001      0.004      0.272      0.786
    UMLABTOT           0.098      0.147      0.664      0.507

 S        ON
    LABTOTF1          -0.008      0.019     -0.406      0.684
    LABTOTF2           0.000      0.013     -0.018      0.986
    UGLABTOT          -0.001      0.003     -0.209      0.835
    UMLABTOT          -0.063      0.115     -0.548      0.584

 Q        ON
    LABTOTF1          -0.013      0.025     -0.532      0.595
    LABTOTF2          -0.014      0.023     -0.596      0.551
    UGLABTOT           0.007      0.006      1.131      0.258
    UMLABTOT          -0.489      0.171     -2.859      0.004

数字不断变化，变量（K1、K2、LABTOTF1 等）和变量的数量在文件中不断变化。但是I ON、S ON、Q ON 存在于所有文件中。

我想从这些输出文件中提取这些行，并使用 python 脚本将它们放入单个输出文件中。

到目前为止，我的方法包括编写嵌套的 for 循环，这既不高效也不有效，因为每个文件中的行数不断变化。

我第一次尝试获取 I ON 行和值 (K1 & K2) 时使用了以下代码行：

file = open("./my_folder/my_file.out","r")
lines = [line for line in file]
file.close()
collector = []
for i in range(0,len(lines)):
    if lines[i] == '\n':
        continue
    elif "I        ON\n" in lines[i]:
        collector.append(lines[i])
        collector.append(lines[i+1])
        collector.append(lines[i+2])
        i += 4
        continue

从文本文件中提取这些行的最有效和 Pythonic 方式是什么？

编辑：我感兴趣的行是“标题”以及包含变量+值的行。例如。如果我想要 I ON 部分，我想从前面的示例中提取以下几行：

I        ON
    K1                -0.247      0.321     -0.769      0.442
    K2                 0.161      0.232      0.696      0.486

【问题讨论】：

标签： python

【解决方案1】：

假设这是文件结构：

out_lines = []
for line in lines:
    if len(line.strip().split()) == 2:
        out_lines.append(line)

【讨论】：

抱歉，我的问题好像不太清楚。更新它以准确显示我有兴趣拉的线。
您可以轻松扩展我的示例。只需将每一行附加到 out_lines，如果第二行 (if len(line.strip().split()) == 2) 上的条件为真，则“刷新”该行列表并开始一个新行。
嗨，Shachar，不起作用的原因是缺乏特异性。如果文本中有另一行只有两个单词，那也将附加到输出变量中。

【解决方案2】：

如果您想搜索确切的关键结构，您可以使用正则表达式。下面的代码仅适用于一个“.out”文件，并为上述测试数据的每种标题类型生成一个文件。

import re
file_path = 'E:\\' # the path to the folder with the .out file
file_name = 'test.out'

# for multiple files, insert create a loop for the section below.
with open(file_path + file_name, 'r') as f:
    line_keys = f.readline()
    while line_keys:  # If it is not empty
        key_search = re.search(' ?[ISQ]\s*ON', line_keys)  # search for the key pattern
        if key_search is not None:  # If a match is found
            file_output = line_keys[1:2] + '.txt'
            with open(file_path + file_output, 'a') as f_out:
                f_out.write(line_keys)  # If you repeatedly want the heading of each section
                while True:  # Read the subsequent lines
                    lines_data = f.readline()
                    if lines_data == "\n":
                        break
                    if lines_data == "":
                        break
                    f_out.write(lines_data)
                f_out.write('\n')  # to separate the different sections by a blank line
        line_keys = f.readline()

【讨论】：