Python 3.x 在特定标题后打印行数答案

【问题标题】：Python 3.x print number of lines after a specific headerPython 3.x 在特定标题后打印行数
【发布时间】：2017-10-15 22:43:08
【问题描述】：

我有一个似乎无法解决的问题；抱歉，如果这是重复的，但从未遇到过真正的答案。我正在从配置文件中提取特定信息，该文件以文本块的形式显示信息，我只需要打印特定的块，而不需要标题。因此，例如（使用下面的文本格式）我只想捕获 Header2 下面的信息，而不是标题 3 之后的任何信息：

#   output could containmultiple headers, and lines, or no lines per header this is an example of what could be present but it is not absolute. 

header1
-------
line1
line2
line3 # can be muiplies availables or known

header2
-------
line1
line2
line3 # can be muiplies availables or known

header3
-------

header4
-------
line1
line2
line3 # can be multiple linnes or none not known

这是我开始使用的代码，但卡在第二个循环布尔值或逻辑上，仅打印该标题块的行：

Raw_file = "scrap.txt"
scrape = open(Raw_file,"r") 


for fooline in scrape:

        if "Header" in fooline:
                #print(fooline) # prints all lines
                    #print lines under header 2 and stop before header 3



scrape.close()

【问题讨论】：

标签： python python-3.x text printing

【解决方案1】：

使用标题行的检测来打开/关闭控制打印的布尔值：

RAW_FILE = "scrap.txt"

DESIRED = 'header2'

with open(RAW_FILE) as scrape:

    printing = False

    for line in scrape:

        if line.startswith(DESIRED):
            printing = True
        elif line.startswith('header'):
            printing = False
        elif line.startswith('-------'):
            continue
        elif printing:
            print(line, end='')

输出

> python3 test.py
line1
line2
line3 # can be muiplies availables or known

>

根据需要进行调整。

【讨论】：

非常感谢，如果我还想在行中打印一个对象，我该怎么做。我尝试将其拆分并打印 line[0] 以获得“3”。 line sample = "3 man enable none" 但是没有运气一直返回一个 no 对象，也许我不明白什么。

【解决方案2】：

您可以考虑使用正则表达式将其分成块。

如果文件的大小可以管理，只需一次读取所有文件并使用正则表达式，例如：

(^header\d+[\s\S]+?(?=^header|\Z))

把它分成块。 Demo

那么您的 Python 代码将如下所示（获取标题之间的任何文本）：

import re

with open(fn) as f:
    txt=f.read()

for m in re.finditer(r'(^header\d+[\s\S]+?(?=^header|\Z))', txt, re.M):
    print(m.group(1))

如果文件比您想一次性读取的文件大，您可以使用mmap 和正则表达式并以相当大的块读取文件。

如果您只寻找一个标题，那就更容易了：

m=re.search(r'(^header2[\s\S]+?(?=^header|\Z))', txt, re.M)
if m:
    print(m.group(1))

Demo of regex

【讨论】：

【解决方案3】：

您可以根据匹配的header2 和header3 内容设置开始和停止收集的标志。

example.txt 包含提供的完整示例数据：

f = "example.txt"
scrape = open(f,"r") 

collect = 0
wanted = []

for fooline in scrape:
    if "header2" in fooline:
        collect = 1
    if "header3" in fooline:
        collect = 2

    if collect == 1:
        wanted.append(fooline)
    elif collect == 2:
        break

scrape.close()

wanted 输出：

['header2\n',
 '-------\n',
 'line1\n',
 'line2\n',
 'line3 # can be muiplies availables or known\n',
 '\n']

【讨论】：

【解决方案4】：

最初，将flag 设置为False。检查该行是否以header2 开头。如果True，则设置flag。如果该行以header3 开头，请将flag 设置为False。

如果设置了flag，则打印行。

Raw_file = "scrap.txt"
scrape = open(Raw_file,"r") 
flag = False

for fooline in scrape:
    if fooline.find("header3") == 0: flag = False # or break
    if flag:
        print(fooline)
    if fooline.find("header2") == 0: flag = True
scrape.close()

输出：

-------

line1

line2

line3 # can be muiplies availables or known

【讨论】：