【问题标题】:Grep for a word, and if found print 10 lines before and 10 lines after the pattern matchgrep 一个单词,如果找到,则在模式匹配之前打印 10 行,在模式匹配之后打印 10 行
【发布时间】:2016-03-16 05:06:55
【问题描述】:

我正在处理一个巨大的文件。我想在该行中搜索一个单词,找到后我应该在模式匹配之前打印 10 行,在模式匹配之后打印 10 行。如何在 Python 中做到这一点?

【问题讨论】:

  • 你是在linux还是windows上工作?
  • @mkHun, Readhat Linux

标签: python grep


【解决方案1】:
import collections
import itertools
import sys

with open('huge-file') as f:
    before = collections.deque(maxlen=10)
    for line in f:
        if 'word' in line:
            sys.stdout.writelines(before)
            sys.stdout.write(line)
            sys.stdout.writelines(itertools.islice(f, 10))
            break
        before.append(line)

使用collections.deque在匹配前保存最多10行,使用itertools.islice在匹配后获取接下来的10行。


更新要排除带有 ip/mac 地址的行:

import collections
import itertools
import re  # <---
import sys

addr_pattern = re.compile(
    r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|'
    r'\b[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}\b',
    flags=re.IGNORECASE
)  # <--

with open('huge-file') as f:
    before = collections.deque(maxlen=10)
    for line in f:
        if addr_pattern.search(line):  # <---
            continue                   # <---
        if 'word' in line:
            sys.stdout.writelines(before)
            sys.stdout.write(line)
            sys.stdout.writelines(itertools.islice(f, 10))
            break
        before.append(line)

【讨论】:

  • @falsetru :我想忽略包含IP地址的行,然后选择前后10行。我怎么能那样做?这是我的正则表达式:("^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$")。如何忽略线条?
  • @RowanaRavenclaw,我不确定我是否理解你的问题,但试试这个:if 'word' in line and not re.search(r'\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b', line):。你需要import re
  • 我想 grep 寻找一个模式,如果找到,打印上下 10 行。但是在打印上面和下面的这 10 行时,我想忽略那些包含 MAC 地址的行。 (我仍然想要每行 10 行,但那些行没有 MAC 地址)~谢谢!
【解决方案2】:

使用grep-C 选项,最简单的解决方案:

grep -C 10 'what_to_search' file.txt

【讨论】:

  • 您能详细说明一下吗?我如何在 Python 中做到这一点?
  • @RowanaRavenclaw 你不能在grep 中做吗? Python 解决方案不会像 grep 那样快。。
  • @heemayl,我正在开发一个工具,结果需要显示在不同的文件中。此外,我读取输入时的文件是动态创建的。
  • 您可以运行此 grep 命令并使用 subprocess.check_output() 收集输出。只需删除file.txt 并将文件内容传递给stdinPopen()communicate() 也是一种选择。
【解决方案3】:

试试这个

#!/usr/bin/python
import commands

filename = "any filename"
string_to_search = "What you want to search"

extract  = (commands.getstatusoutput("grep -C 10 '%s' %s"%(string_to_search,filename)))[1]

print(extract)

【讨论】:

    【解决方案4】:

    在 python 中使用这样的短代码来进行上下文 grepping 怎么样:

    $ cat file2
    abcd
    xyz
    print this 1
    print this 2
    line having pattern
    print this 1
    print this 2
    abcd
    fgg
    $ cat p.py 
    import re
    num_lines_cnt=2
    lines=open('file2').readlines()
    print([lines[i-num_lines_cnt:i+num_lines_cnt+1] for i in range(len(lines)) if re.search('pattern', lines[i]) is not None])
    $ python3 p.py 
    [['print this 1\n', 'print this 2\n', 'line having pattern\n', 'print this 1\n', 'print this 2\n']]
    $
    

    【讨论】:

    • 打印时如何添加新行?谢谢!
    • @RowanaRavenclaw 正常打印或使用\n 加入,您想在每场比赛之间打印一个换行符吗?
    【解决方案5】:

    无需导入任何包,我们就可以做到这一点。

    string_to_search=input("Enter the String: ")
    before=int(input("How many lines to print before string match ? "))
    after=int(input("How many lines to print after string match ? "))
    file_to_search=input("Enter the file to search: ")
    
    def search_string(string_to_search, before, after, file_to_search):
        with open(file_to_search) as f:
            all_lines = f.readlines()
            last_line_number=len(all_lines)
            for current_line_no, current_line in enumerate(all_lines):
                if string_to_search in current_line:
                    start_line_no=max(current_line_no - before, 0)
                    end_line_no=min(last_line_number, current_line_no+after+1)
                    for i in range(start_line_no, current_line_no):print(all_lines[i])              
                    for i in range(current_line_no, end_line_no):print(all_lines[i])
                    break
    search_string(string_to_search, before, after, file_to_search)
    

    说明:

    string_to_search:您想要 grep 的单词/模式
    before:您想要在模式匹配之前打印的行数
    after:您想要在之后打印的行数模式匹配
    my_file.txt 是包含单词/模式/字符串的文件

    current_lineno 将包含包含模式的行号

    示例文件内容:

    $cat my_file.txt
    this is line 1
    this is line 2
    this is line 3
    this is line 4
    this is line 5 my pattern is here
    this is line 6
    this is line 7
    this is line 8
    this is line 9
    this is line 10
    

    示例执行和输出:

    $python grep_3.py
    Enter the String: my pattern
    How many lines to print before string match ? 2
    How many lines to print after string match ? 1000
    Enter the file to search: my_file.txt
    this is line 3
    
    this is line 4
    
    this is line 5 my pattern is here
    
    this is line 6
    
    this is line 7
    
    this is line 8
    
    this is line 9
    
    this is line 10
    

    以上代码等价于Unix `grep'命令

    $ grep -A 2000 -B 2 'my pattern' my_file.txt
    this is line 3
    this is line 4
    this is line 5 my pattern is here
    this is line 6
    this is line 7
    this is line 8
    this is line 9
    this is line 10
    

    【讨论】:

      最近更新 更多