执行 zgrep 命令并将结果写入文件答案

【问题标题】：Execute zgrep command and write results to a file执行 zgrep 命令并将结果写入文件
【发布时间】：2016-01-31 17:05:48
【问题描述】：

我有一个文件夹，其中包含许多文件，例如 file_1.gz 到 file_250.gz 并且还在增加。

通过它们进行搜索的zgrep 命令类似于：

zgrep -Pi "\"name\": \"bob\"" ../../LM/DATA/file_*.gz

我想在 python 子进程中执行这个命令，例如：

out_file = os.path.join(out_file_path, file_name)
search_command = ['zgrep', '-Pi', '"name": "bob"', '../../LM/DATA/file_*.gz']
process = subprocess.Popen(search_command, stdout=out_file)

问题是 out_file 已创建但它是空的并且引发了这些错误：

<type 'exceptions.AttributeError'>
'str' object has no attribute 'fileno'

解决办法是什么？

【问题讨论】：

你真的需要在这里使用subprocess吗？为什么不直接使用os.walk() 获取该文件夹中的所有文件，然后使用正则表达式搜索您想要的文件？
其实，如果要对所有文件运行相同的命令，根本不需要python。 find ../../LM/DATA -name 'file*.gz' | xargs zgrep -Pi '"name": "bob"'。如果要并行运行它，只需使用GNU parallel 而不是xargs。
原因是代码和平是一个大项目的一部分，它搜索日志文件，然后将结果返回给客户端

标签： python file shell command subprocess

【解决方案1】：

你需要传递一个文件对象：

process = subprocess.Popen(search_command, stdout=open(out_file, 'w'))

引用manual，强调我的：

stdin、stdout 和 stderr 分别指定执行程序的标准输入、标准输出和标准错误文件句柄。有效值为PIPE、现有文件描述符（正整数）、现有文件对象和无 . PIPE 表示应该创建一个通往子级的新管道。默认设置为None，不会发生重定向；子文件句柄将从父文件继承。

结合 LFJ 的回答 - 建议使用便捷功能，并且您需要使用 shell=True 才能使通配符 (*) 起作用：

subprocess.call(' '.join(search_command), stdout=open(out_file, 'w'), shell=True)

或者当你使用 shell 时，你也可以使用 shell 重定向：

subprocess.call("%s > %s" % (' '.join(search_command), out_file), shell=True)

【讨论】：

那么process = subprocess.Popen(search_command, stdout=subprocess.PIPE; out_file = process.stdout呢？
这应该有效地做什么？同stdout=None?
这会将命令输出保存到process.stdout，如果没有stdout=subprocess.PIPE，那么process.stdout将为空。您可以尝试一下，看看有什么不同。
我认为 OP 不需要保存在 process.stdout 中的任何内容。据我了解，他试图将每次运行的结果保存到out_file。
好吧，我再次检查了这个问题，也许你是对的 :)

【解决方案2】：

如果您想执行 shell 命令并获取输出，请尝试使用subprocess.check_output()。它非常简单，您可以轻松地将输出保存到文件中。

command_output = subprocess.check_output(your_search_command, shell=True)
with open(out_file, 'a') as f:
    f.write(command_output)

【讨论】：

无需将整个输出加载到内存中，只需立即写入文件即可。 Pass the file object as stdout parameter instead

【解决方案3】：

有两个问题：

您应该使用有效的.fileno() 方法而不是文件名来传递一些东西
shell 扩展*，但子进程不会调用shell，除非您询问。您可以使用glob.glob() 手动扩展文件模式。

例子：

#!/usr/bin/env python
import os
from glob import glob
from subprocess import check_call

search_command = ['zgrep', '-Pi', '"name": "bob"'] 
out_path = os.path.join(out_file_path, file_name)
with open(out_path, 'wb', 0) as out_file:
    check_call(search_command + glob('../../LM/DATA/file_*.gz'), 
               stdout=out_file)

【讨论】：

【解决方案4】：

我的问题包括两部分：

@liborm 也回答了第一部分
第二部分与 zgrep 尝试搜索的文件有关。当我们编写类似 zgrep "pattern" path/to/files/*.gz 之类的命令时，bash 会自动删除 *.gz 所有文件都以 .gz 结尾。当我在子进程中运行命令时，没有人用真实文件替换 *.gz，因此错误 gzip: ../../LM/DATA/file_*.gz:没有这样的文件或目录 引发。于是解决了：
```
for file in os.listdir(archive_files_path):
    if file.endswith(".gz"):
        search_command.append(os.path.join(archive_files_path, file))
```

【讨论】：