将几个子流程连接在一起答案

【问题标题】：piping together several subprocesses将几个子流程连接在一起
【发布时间】：2011-09-14 13:10:08
【问题描述】：

我有 5 个进程 p1,p2,...,p5 我想将一些数据写入 p1 的标准输入，将 p1 输出通过管道传输到 p2 标准输入，最后从 p5 的输出中读取最终结果。

到目前为止我所尝试的：

p1 = Popen(['p1'], stdin=PIPE, stdout=PIPE)
p2 = Popen(['p2'], stdin=p1.stdout, stdout=PIPE)
...
p5 = Popen(['p5'], stdin=p4.stdout, stdout=PIPE)

# write data to stdin
p1.stdin.write(indata)
p1.stdin.close()

# not sure in what order to close the pipes here, if at all

# read output
out = p5.stdout.read()
print out

最后一个代码被剪断了，因为我必须不正确地执行读/写操作。

我能够使用communicate() 和两个进程来工作，而无需向第一个进程提供任何输入（Python 文档中的示例）：

output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

但我不知道如何在不挂起解释器的情况下向第一个进程提供输入。

我也可以为此使用 bash 脚本（我已经编写过并且可以使用），但我想知道如何使用 Python 来实现。

所以，我想问一下，如何正确执行所有这些操作，特别是按照什么顺序对管道进行读/写/关闭操作？

我正在开发 64 位 Linux，如果这很重要的话。

编辑：我忘了提到所有进程 p1,..p5 都会消耗它们给出的所有输入，处理它，写入标准输出然后终止。因此，管道中的下一个进程不应在前一个进程完成处理之前终止。

EDIT2：我知道我也可以使用

command = 'bash -c "p1 | p2 | p3 | p4 | p5"'
proc = Popen([command], shell=True)
out, err = proc.communicate(input=indata)
print out

但我的主要兴趣是了解如何纯粹在 python 代码中链接管道。

【问题讨论】：

这里有一个相关问题：stackoverflow.com/q/295459/1858225 看来使用显式临时文件（如此处接受的答案）是不需要必要的；但是，似乎没有任何直接且纯粹的 Pythonic 方式来做到这一点，这让我感到惊讶。 Plumbum（在其中一个答案中提到）看起来有点不错，但对我来说太“神奇”了（这是 Python，而不是 Perl！）。完全使用subprocess（例如sam.nipl.net/code/python/pipeline.py，来自对另一个答案的评论）完成此任务的方法似乎容易出现奇怪的错误。
....实际上，我刚刚发现了 pipes 模块 (docs.python.org/2/library/pipes.html) 并相应地添加了另一个问题的答案。它看起来比其他解决方案好得多。

标签： python pipe

【解决方案1】：

也许这会有所帮助：

import sys
import tempfile
from subprocess import Popen, PIPE


cmd = [sys.executable, '-c', 'print raw_input()']

# Using a temp file to give input data to the subprocess instead of stdin.write to avoid deadlocks.
with tempfile.TemporaryFile() as f:
    f.write('foobar')
    f.seek(0)  # Return at the start of the file so that the subprocess p1 can read what we wrote.
    p1 = Popen(cmd, stdin=f, stdout=PIPE)

p2 = Popen(cmd, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd, stdin=p2.stdout, stdout=PIPE)

# No order needed.
p1.stdout.close()
p2.stdout.close()

# Using communicate() instead of stdout.read to avoid deadlocks. 
print p3.communicate()[0]

输出：

$ python test.py
foobar

希望这可以是hepfull。

【讨论】：

感谢您的解决方案。它非常聪明并且有效。如果我现在明白了，没有真正的文件描述符就没有办法输入？比如使用StringIO文件对象是不行的，因为没有fileno？
@Timo：是的，你需要一个带有 fileno 的真实文件，很高兴它有帮助:)
嘿，我一读到这个问题，我就想“我确定答案会以某种方式涉及文件 I/O……”
@JAB：是的，子流程模块没有给我们太多选择:)
Python 3 注意：print(input()) 和 f.write(b'foobar')（或 with tempfile.TemporaryFile('w')）。也就是说，我无法检查它是否在我的脚本中完全工作，因为在我的情况下，我在 Ubuntu 上连续两次使用cmd=["gsettings", "set", gsettings_schema, gsettings_key, value] 设置配置值，但最终值有时是第一个，有时是第二个.