使用子进程在 Python 中逐行写入/读取管道答案

【问题标题】：chaining line by line writing/reading of pipes in Python with subprocess使用子进程在 Python 中逐行写入/读取管道
【发布时间】：2013-02-17 20:03:44
【问题描述】：

我有以下似乎可以工作的代码，用于在 python 中将管道与子进程链接在一起，同时逐行读取/写入它们（不预先使用communicate()）。该代码只是调用一个 Unix 命令 (mycmd)，读取其输出，然后将其写入另一个 Unix 命令 (next_cmd) 的标准输入，并将最后一个命令的输出重定向到一个文件。

    # some unix command that uses a pipe: command "a"
    # writes to stdout and "b" reads it and writes to stdout
    mycmd = "a | b" 
    mycmd_proc = subprocess.Popen(mycmd, shell=True,
                                  stdin=sys.stdin,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
    # nextCmd reads from stdin, and I'm passing it mycmd's output
    next_cmd = "nextCmd -stdin"
    output_file = open(output_filename, "w")
    next_proc = subprocess.Popen(next_cmd, shell=True,
                                  stdin=subprocess.PIPE,
                                  stdout=output_file)
    for line in iter(mycmd.stdout.readline, ''):
        # do something with line
        # ...
        # write it to next command
        next_proc.stdin.write(line)
    ### If I wanted to call another command here that passes next_proc output
    ### line by line to another command, would I need
    ### to call next_proc.communicate() first?
    next_proc.communicate()
    output_file.close()

这似乎有效，它只在命令末尾调用communicate()。

我正在尝试扩展此代码以添加另一个命令，以便您可以这样做：

mycmd1 | mycmd2 | mycmd3 > some_file

含义：逐行，从 Python 读取 mycmd1 的输出，处理该行，将其馈送到 mycmd2，读取 mycmd2 的输出并逐行 处理它并馈送将其发送到 mycmd3，后者又将其输出放入some_file。这是可能的还是必然会以死锁/阻塞/未刷新的缓冲区结束？请注意，我不只是将三个 unix 命令作为管道调用，因为我想在其间使用 Python 进行干预，并在将每个命令的输出提供给下一个命令之前逐行对其进行后处理。

我想避免调用通信并将所有输出加载到内存中 - 而我想逐行解析它。谢谢。

【问题讨论】：

你看过手册中的这个例子吗？ docs.python.org/2/library/…
@zigg：当然，但它没有回答问题。我不只是制作管道，我想从一个管道读取并写入另一个管道，而不仅仅是调用恰好制作管道的 unix 命令。对我来说重要的是要知道在读/写块或未刷新的缓冲区时会发生什么。
我很抱歉；你说得对。我应该更仔细地阅读你的问题。

标签： python unix subprocess pipe

【解决方案1】：

这应该处理任意数量的命令：

import sys
import subprocess

def processFirst(out):
    return out

def processSecond(out):
    return out

def processThird(out):
    return out

commands = [("a|b", processFirst), ("nextCmd -stdin", processSecond), ("thirdCmd", processThird)]

previous_output = None
for cmd,process_func in commands:
    if previous_output is None:
        stdin = sys.stdin
    else:
        stdin = subprocess.PIPE
    proc = subprocess.Popen(cmd, shell=True,
                            stdin = stdin,
                            stdout = subprocess.PIPE)
    if previous_output is not None:
        proc.stdin.write(previous_output)

    out,err = proc.communicate()
    out = process_func(out)
    previous_output = out

只需将您要运行的任何命令以及应处理其输出的函数添加到命令列表中即可。最后一个命令的输出将在循环结束时出现在 previous_output 中。

为避免任何死锁/缓冲/等问题，您只需使用proc.communicate() 运行每个命令以完成，这将返回输出（而不是像您的示例中那样直接读取它）。然后将其输入到下一个命令中，然后让它运行完成，依此类推。

编辑：刚刚注意到您不想预先使用communicate()，并且您想逐行做出反应。我会稍微修改一下我的答案来解决这个问题

This answer 提供了一个示例，说明如何使用 select.select() 从管道中逐行读取而不阻塞。

以下是用于您的特定情况的示例：

import sys
import subprocess
import select
import os

class LineReader(object):
    def __init__(self, fd, process_func):
        self._fd = fd
        self._buf = ''
        self._process_func = process_func
        self.next_proc = None

    def fileno(self):
        return self._fd

    def readlines(self):
        data = os.read(self._fd, 4096)
        if not data:
            # EOF
            if self.next_proc is not None:
                self.next_proc.stdin.close()
            return None
        self._buf += data
        if '\n' not in data:
            return []
        tmp = self._buf.split('\n')
        tmp_lines, self._buf = tmp[:-1], tmp[-1]
        lines = []
        for line in tmp_lines:
            lines.append(self._process_func(line))
            if self.next_proc is not None:
                self.next_proc.stdin.write("%s\n" % lines[-1])

        return lines

def processFirst(line):
    return line

def processSecond(line):
    return line

def processThird(line):
    return line

commands = [("a|b", processFirst), ("nextCmd -stdin", processSecond), ("thirdCmd", processThird)]

readers = []
previous_reader = None
for cmd,process_func in commands:
    if previous_reader is None:
        stdin = sys.stdin
    else:
        stdin = subprocess.PIPE
    proc = subprocess.Popen(cmd, shell=True,
                            stdin = stdin,
                            stdout = subprocess.PIPE)

    if previous_reader is not None:
        previous_reader.next_proc = proc

    previous_reader = LineReader(proc.stdout.fileno(), process_func)
    readers.append(previous_reader)


while readers:
    ready,_,_  = select.select(readers, [], [], 10.0)
    for stream in ready:
        lines = stream.readlines()
        if lines is None:
            readers.remove(stream)

【讨论】：