提取文本文件中两个字符串之间的值答案

【问题标题】：Extract Values between two strings in a text file提取文本文件中两个字符串之间的值
【发布时间】：2025-12-06 07:35:01
【问题描述】：

假设我有一个包含以下内容的文本文件

fdsjhgjhg
fdshkjhk
 Start
     Good Morning
     Hello World
 End
dashjkhjk
dsfjkhk
Start
  hgjkkl
  dfghjjk
  fghjjj
Start
   Good Evening
   Good 
End

我写了以下代码：

infile = open('test.txt','r')
outfile= open('testt.txt','w')
copy = False
for line in infile:
    if line.strip() == "Start":
        copy = True
    elif line.strip() == "End":
        copy = False
    elif copy:
        outfile.write(line)

我在 outfile 中有这个结果：

     Good Morning
     Hello World
     hgjkkl
     dfghjjk
     fghjjj
     Good Evening
     Good

我的问题是我只想获取 start 和 end 之间的数据，而不是 start 和 start 或 End 和 End 之间的数据

【问题讨论】：

您应该尝试使用缓冲区变量来存储“开始”之后遇到的内容，直到遇到“结束”然后将其写入文件。

标签： python file csv

【解决方案1】：

您可以保留一个临时的行列表，并且仅在您知道某个部分符合您的条件后才提交它们。也许可以尝试以下方法：

infile = open('test.txt','r')
outfile= open('testt.txt','w')
copy = False
tmpLines = []
for line in infile:
    if line.strip() == "Start":
        copy = True
        tmpLines = []
    elif line.strip() == "End":
        copy = False
        for tmpLine in tmpLines:
            outfile.write(tmpLine)
    elif copy:
        tmpLines.append(line)

这给出了输出

     Good Morning
     Hello World
 Good Evening
 Good

【讨论】：

【解决方案2】：

大问题！这是一个桶问题，每个开始都需要结束。

你得到结果的原因是因为有两个连续的'Start'。

最好将信息存储到某处，直到触发“结束”。

infile = open('scores.txt','r')
outfile= open('testt.txt','w')
copy = False
for line in infile:

    if line.strip() == "Start":
        bucket = []
        copy = True

    elif line.strip() == "End":
        for strings in bucket:
            outfile.write( strings + '\n')
        copy = False

    elif copy:
        bucket.append(line.strip())

【讨论】：

【解决方案3】：

这是一种使用正则表达式的 hacky 但可能更直观的方法。它找到存在于“开始”和“结束”对之间的所有文本，并且 print 语句将它们修剪掉。

import re 
infile = open('test.txt','r')
text = infile.read() 

matches = re.findall('Start.*?End',text)
for m in matches: 
    print m.strip('Start ').strip(' End')

【讨论】：

【解决方案4】：

如果您不希望得到嵌套结构，您可以这样做：

# match everything between "Start" and "End"
occurences = re.findall(r"Start(.*?)End", text, re.DOTALL)
# discard text before duplicated occurences of "Start"
occurences = [oc.rsplit("Start", 1)[-1] for oc in occurences]
# optionally trim whitespaces
occurences = [oc.strip("\n") for oc in occurences]

打印出来的

>>> for oc in occurences: print(oc)
     Good Morning
     Hello World
   Good Evening
   Good

如果需要，您可以将 \n 添加为 Start 和 End 的一部分

【讨论】：

【解决方案5】：

您可以使用正则表达式来做到这一点。这将排除流氓Start 和End 行。这是live example

import re

f = open('test.txt','r')
txt = f.read()
matches = re.findall(r'^\s*Start\s*$\n((?:^\s*(?!Start).*$\n)*?)^\s*End\s*$', txt, flags=re.M)

【讨论】：

Start\s*((?:(?!Start).*$\s)+?)\s*End 会更高效。