【发布时间】:2016-03-16 05:06:55
【问题描述】:
我正在处理一个巨大的文件。我想在该行中搜索一个单词,找到后我应该在模式匹配之前打印 10 行,在模式匹配之后打印 10 行。如何在 Python 中做到这一点?
【问题讨论】:
-
你是在linux还是windows上工作?
-
@mkHun, Readhat Linux
我正在处理一个巨大的文件。我想在该行中搜索一个单词,找到后我应该在模式匹配之前打印 10 行,在模式匹配之后打印 10 行。如何在 Python 中做到这一点?
【问题讨论】:
import collections
import itertools
import sys
with open('huge-file') as f:
before = collections.deque(maxlen=10)
for line in f:
if 'word' in line:
sys.stdout.writelines(before)
sys.stdout.write(line)
sys.stdout.writelines(itertools.islice(f, 10))
break
before.append(line)
使用collections.deque在匹配前保存最多10行,使用itertools.islice在匹配后获取接下来的10行。
更新要排除带有 ip/mac 地址的行:
import collections
import itertools
import re # <---
import sys
addr_pattern = re.compile(
r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|'
r'\b[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}:[\da-f]{2}\b',
flags=re.IGNORECASE
) # <--
with open('huge-file') as f:
before = collections.deque(maxlen=10)
for line in f:
if addr_pattern.search(line): # <---
continue # <---
if 'word' in line:
sys.stdout.writelines(before)
sys.stdout.write(line)
sys.stdout.writelines(itertools.islice(f, 10))
break
before.append(line)
【讨论】:
if 'word' in line and not re.search(r'\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b', line):。你需要import re
使用grep 和-C 选项,最简单的解决方案:
grep -C 10 'what_to_search' file.txt
【讨论】:
grep 中做吗? Python 解决方案不会像 grep 那样快。。
subprocess.check_output() 收集输出。只需删除file.txt 并将文件内容传递给stdin。 Popen() 和 communicate() 也是一种选择。
试试这个
#!/usr/bin/python
import commands
filename = "any filename"
string_to_search = "What you want to search"
extract = (commands.getstatusoutput("grep -C 10 '%s' %s"%(string_to_search,filename)))[1]
print(extract)
【讨论】:
在 python 中使用这样的短代码来进行上下文 grepping 怎么样:
$ cat file2
abcd
xyz
print this 1
print this 2
line having pattern
print this 1
print this 2
abcd
fgg
$ cat p.py
import re
num_lines_cnt=2
lines=open('file2').readlines()
print([lines[i-num_lines_cnt:i+num_lines_cnt+1] for i in range(len(lines)) if re.search('pattern', lines[i]) is not None])
$ python3 p.py
[['print this 1\n', 'print this 2\n', 'line having pattern\n', 'print this 1\n', 'print this 2\n']]
$
【讨论】:
\n 加入,您想在每场比赛之间打印一个换行符吗?
无需导入任何包,我们就可以做到这一点。
string_to_search=input("Enter the String: ")
before=int(input("How many lines to print before string match ? "))
after=int(input("How many lines to print after string match ? "))
file_to_search=input("Enter the file to search: ")
def search_string(string_to_search, before, after, file_to_search):
with open(file_to_search) as f:
all_lines = f.readlines()
last_line_number=len(all_lines)
for current_line_no, current_line in enumerate(all_lines):
if string_to_search in current_line:
start_line_no=max(current_line_no - before, 0)
end_line_no=min(last_line_number, current_line_no+after+1)
for i in range(start_line_no, current_line_no):print(all_lines[i])
for i in range(current_line_no, end_line_no):print(all_lines[i])
break
search_string(string_to_search, before, after, file_to_search)
说明:
string_to_search:您想要 grep 的单词/模式before:您想要在模式匹配之前打印的行数after:您想要在之后打印的行数模式匹配my_file.txt 是包含单词/模式/字符串的文件
current_lineno 将包含包含模式的行号
示例文件内容:
$cat my_file.txt
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5 my pattern is here
this is line 6
this is line 7
this is line 8
this is line 9
this is line 10
示例执行和输出:
$python grep_3.py
Enter the String: my pattern
How many lines to print before string match ? 2
How many lines to print after string match ? 1000
Enter the file to search: my_file.txt
this is line 3
this is line 4
this is line 5 my pattern is here
this is line 6
this is line 7
this is line 8
this is line 9
this is line 10
以上代码等价于Unix `grep'命令
$ grep -A 2000 -B 2 'my pattern' my_file.txt
this is line 3
this is line 4
this is line 5 my pattern is here
this is line 6
this is line 7
this is line 8
this is line 9
this is line 10
【讨论】: