使用正则表达式复制 Python 文件答案

【问题标题】：Python file copying using regex使用正则表达式复制 Python 文件
【发布时间】：2014-02-24 09:12:45
【问题描述】：

我有一个很大的日志文件。我想提取包含java/javax/or/com 后跟./: 的行。对于这样的每一行，我想提取一些相应的堆栈跟踪行，并以at 开头。例如：

Line1: java.line.something.somethingexception
line 2: at something something
line 3: at something something
line 4: at something something

line 5-20:Junk I don't want to extract.
line 21: javax.line.something.somethingexception
line 22: at something something
line 23: at something something
line 24: at something something

等等……

这里我想复制第 1-4 行，然后再复制第 21-24 行。到目前为止，我的代码收集了包含关键字的行，但我无法弄清楚如何在此之后编写特定的行数，跳过几行并重新开始编写。这些以 at 开头的行是随机的，即它们可以是 100 行，也可以是 250 行，所以没有模式。

这是我的代码：

import re
import sys
from itertools import islice

file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
for line in file:
    if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
          file1.write(line)

此代码仅提取包含关键字的行，但我被困在如何执行下一部分，即复制包含 at 的下一行并将它们写入新文件，在“at”结束处停止。搜索包含关键字的下一行并再次执行相同的操作。

【问题讨论】：

标签： python regex file exception copying

【解决方案1】：

设置一个标志来指示您正在处理的行是否在异常块中：

import re
import sys
from itertools import islice

file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
ex = False
for line in file:
    if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
          file1.write(line)
          ex = True
    elif ex:
          if line.startswith('at'):
              file1.write(line)
          else:
              ex = False

【讨论】：

【解决方案2】：

这可以通过您设置的标志来解决，以防您符合您的特定条件：

java_regex = re.compile(...)  # java 
at_regex = re.compile(...)    # at

copy = False  # flag that control to copy or to not copy to output

for line in file_in:
   if re.search(java_regex, line):
       # start copying if "java" is in the input
       copy = True
   else:
       if copy and not re.search(at_regex, line):
           # stop copying if "at" is not in the input
           copy = False

   if copy:
       file_out.write(line)

【讨论】：