Python Popen - 等待与通信与 CalledProcessError答案

【问题标题】：Python Popen - wait vs communicate vs CalledProcessErrorPython Popen - 等待与通信与 CalledProcessError
【发布时间】：2015-09-08 00:55:15
【问题描述】：

继续from my previous question 我看到要获取我在 python 中通过 Popen 生成的进程的错误代码，我必须调用 wait() 或communicate() （可用于访问 Popen stdout 和 stderr 属性）：

app7z = '/path/to/7z.exe'
command = [app7z, 'a', dstFile.temp, "-y", "-r", os.path.join(src.Dir, '*')]
process = Popen(command, stdout=PIPE, startupinfo=startupinfo)
out = process.stdout
regCompressMatch = re.compile('Compressing\s+(.+)').match
regErrMatch = re.compile('Error: (.*)').match
errorLine = []
for line in out:
    if len(errorLine) or regErrMatch(line):
        errorLine.append(line)
    if regCompressMatch(line):
        # update a progress bar
result = process.wait() # HERE
if result: # in the hopes that 7z returns 0 for correct execution
    dstFile.temp.remove()
    raise StateError(_("%s: Compression failed:\n%s") % (dstFile.s, 
                       "\n".join(errorLine)))

但是the docs 警告wait() 可能会死锁（当 stdout=PIPE 时，就是这种情况），而communicate() 可能会溢出。所以：

在这里使用什么合适？请注意，我确实使用了输出

我应该如何使用通信？会不会：

process = Popen(command, stdout=PIPE, startupinfo=startupinfo)
out = process.communicate()[0]
# same as before...
result = process.returncode
if result: # ...

不确定阻塞和内存错误

有更好/更 Pythonic 的方式来处理这个问题吗？我认为subprocess.CalledProcessError or the subprocess.check_call/check_output 不适用于我的情况 - 或者他们是否适用？

免责声明：我没有编写代码，我是当前的维护者，因此问题 3。

相关：

如果这有所不同，我在 Windows 上 - python 2.7.8

应该有一种——最好只有一种——明显的方法

【问题讨论】：

不相关：您的行处理代码可能已损坏，例如，regErrMatch(line) 仅被调用一次。
@JFSebastian：嘿，谢谢-我认为目的是“一旦errorLine 中有某些内容，然后添加所有剩余的行”（当我看到它时我也感到困惑）-可能一旦有一个错误意味着一切都会失败

标签： python python-2.7 error-handling popen

【解决方案1】：

关于死锁：将stdout=PIPE 和wait() 一起使用是安全的iff 您从管道中读取。 .communicate() 为您阅读并致电wait()
关于内存：如果输出可以无限，那么你不应该使用.communicate()，它将所有输出累积到内存中。

在这里使用什么合适？

要启动子进程，逐行读取它的输出并等待它退出：

#!/usr/bin/env python
from subprocess import Popen, PIPE

process = Popen(command, stdout=PIPE, bufsize=1)
with process.stdout:
    for line in iter(process.stdout.readline, b''): 
        handle(line)
returncode = process.wait()

由于有限的操作系统管道缓冲区，此代码不会死锁。此外，该代码支持具有无限输出的命令（如果单个行适合内存）。

iter() 用于在子进程的标准输出缓冲区被刷新后立即读取一行，以解决the read-ahead bug in Python 2。如果您不需要在写入行后立即读取行而不等待缓冲区填充或子进程结束，则可以使用简单的for line in process.stdout。见Python: read streaming input from subprocess.communicate()。

如果您知道命令输出在所有情况下都可以放入内存，那么您可以一次获得所有输出：

#!/usr/bin/env python
from subprocess import check_output

all_output = check_output(command)

如果命令以非零退出状态返回，它将引发CalledProcessError。在内部，check_output() uses Popen() and .communicate()

应该有一种——最好只有一种——明显的方法

subprocess.Popen() 是在许多情况下都可以使用的主要 API。常见的用例有Popen.communicate()、check_output()、check_call()等便利函数/方法。

有多种方法和功能，因为有多种不同的用例。

【讨论】：

好吧 - 我不太明白的一件事是 for line in iter(process.stdout.readline, b''): 购买我们而不是 for line in process.stdout: - 我点击了你的链接，我理解了语法，但仍然......也正在关闭管道必要的？（我猜是的）
@Mr_and_Mrs_D: with-statement 像普通文件一样关闭管道。 iter() 是解决 the read-ahead bug in Python 2 所必需的——如果您不需要“实时”输出，则不需要。
非常好，非常感谢 - 我知道with 关闭了管道（：D） - 我不确定它是否需要 - 请在答案中添加错误信息（如前所述在错误中）反对“一种明显的方式”，而且为什么简单的for line in out 不会这样做根本不明显。确实感谢
我误解了关闭。是的，这是必要的。文件描述符是有限的资源，你可能会用完它们（通常对脚本没关系，但你必须在服务器代码中注意）。
@Mr_and_Mrs_D：我使用bufsize=1 记录我的意图来逐行处理流水线。此外，您可以避免在 Python 2 和最新的 Python 3 版本上显式调用 .flush()，同时向子进程发送一行。