【问题标题】:Python subprocess call of unix tee truncates stdin when writing to stdout and logfile写入标准输出和日志文件时,unix tee 的 Python 子进程调用会截断标准输入
【发布时间】:2015-03-31 14:47:57
【问题描述】:

我正在尝试使用子进程在 python 中运行一系列现有脚本。当我使用此代码时,链按预期工作:

p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE)
p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()
p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out)
p2.stdout.close()
p3.communicate()
file_out.close()

输出如下:

100000 GFF lines processed.
[bam_sort_core] merging from 2 files...
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2600000 GFF lines processed.
2700000 GFF lines processed.
2764635 GFF lines processed.
100000 SAM alignment records processed.
200000 SAM alignment records processed.
300000 SAM alignment records processed.
400000 SAM alignment records processed.
500000 SAM alignment records processed.
600000 SAM alignment records processed.
700000 SAM alignment records processed.
800000 SAM alignment records processed.
900000 SAM alignment records processed.
1000000 SAM alignment records processed.
1100000 SAM alignment records processed.
1200000 SAM alignment records processed.
1300000 SAM alignment records processed.
1400000 SAM alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1700000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2300000 SAM alignment records processed.
2400000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2800000 SAM alignment records processed.
2900000 SAM alignment records processed.

所有这些输出都来自 stderr,我希望能够将其写入终端和日志文件。为了实现这一点,我使用 unix tee 命令作为 python 中的子进程,并从上一个子进程命令中将其传递给 stderr。代码如下所示:

p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
tee = subprocess.Popen(['tee', logfile], stdin=p1.stderr)
p1.stderr.close()

p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p1.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p2.stderr)
p2.stderr.close()

p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
p2.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p3.stderr)

p3.communicate()
p3.stderr.close()
tee.communicate()
file_out.close()

此代码中写入我的 file_out 句柄的 stdout 输出是正确的。甚至打印到屏幕和日志文件的标准错误似乎也是正确的信息。但是,stderr 的输出在某些行被截断,我不知道为什么。这是我的日志文件和终端的样子(它们匹配):

 GFF lines processed.
[bam_sort_core] merging from 2 files...
 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
 GFF lines processed.
GFF lines processed.
FF lines processed.
F lines processed.
 lines processed.
ines processed.
700000 GFF lines processed.
2764635 GFF lines processed.
nt records processed.
 records processed.
300000 SAM alignment records processed.
cords processed.
ds processed.
processed.
essed.
d.
000000 SAM alignment records processed.
00 SAM alignment records processed.
 alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2900000 SAM alignment records processed.

为什么传递给 tee 的输出会被截断?这只是一个列移位吗?有没有办法解决这个问题,还是我只是想对子流程做太多事情?

编辑: 这是@tdelaney 代码的SSCCE。它重现了我在更广泛的上下文中使用它时遇到的相同错误。此示例应从包含名为 test.txt 的文件的文件夹中运行。 test.txt 应如下所示(或任何类似的内容,只要某些行是“测试”):

test
not
test

这是玩具代码(确保将 shebang 更改为指向您的 python):

#!/usr/local/bin/python2

import sys
import subprocess
import threading

logfile = "./testlog.txt"

arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]

def log_writer(pipe, log_fp, lock):
    for line in pipe:
        with lock:
            log_fp.write(line)
            sys.stdout.write(line)

with open(logfile, 'w') as log_fp:
    lock = threading.Lock()
    threads = []
    p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE)
    threads.append(threading.Thread(target=log_writer, args=(p1.stdout, log_fp, lock)))

    p2 = subprocess.Popen(arg2, stdin=p1.stdout, stdout=subprocess.PIPE)
    p1.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p2.stdout, log_fp, lock)))

    p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=subprocess.PIPE)
    p2.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p3.stdout, log_fp, lock)))

    for t in threads:
        t.start()

    p3.communicate()

    for t in threads:
        t.join()

注意:如果我注释掉 close() 和communicate() 行,代码就会运行。我有点担心这样做,但从那以后我将在更广泛的背景下遇到各种其他问题。

【问题讨论】:

  • 如果你想在同一个进程中从管道中读取数据,不要调用p1.stdout.close()p2.stdout.close()p3.communicate()(你在代码中使用线程)。使用for p in [p1, p2, p3]: p.wait() 来获取子进程。在log_writer() 的顶部添加with pipe 以在EOF 或错误时关闭管道。见Python subprocess get children's output to file and terminal?

标签: python subprocess tee


【解决方案1】:

问题是您有多个tees 写入单个文件。他们每个人都有自己的文件指针和文件的当前偏移量,并将覆盖彼此的东西的位。一种解决方案是在 python 中使用线程和互斥体来实现日志文件写入。

#!/bin/env python

import sys
import subprocess
import threading

logfile = "./testlog.txt"
file_out = open("./test.output.txt", "w")

arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]

def log_writer(pipe, log_fp, lock):
    for line in pipe:
        with lock:
            log_fp.write(line)
            sys.stdout.write(line)

with open(logfile, 'w') as log_fp:
    lock = threading.Lock()
    threads = []
    processes = []
    p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    threads.append(threading.Thread(target=log_writer, args=(p1.stderr, log_fp, lock)))
    processes.append(p1)

    p2 = subprocess.Popen(arg2, stdin=p1.stderr, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    p1.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p2.stderr, log_fp, lock)))
    processes.append(p2)

    p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
    p2.stdout.close()
    threads.append(threading.Thread(target=log_writer, args=(p3.stderr, log_fp, lock)))
    processes.append(p3)

    file_out.close()

    for t in threads:
        t.start()

    for p in processes:
        p1.wait()

    for t in threads:
        t.join()

【讨论】:

  • 看着这个,我看不出它是如何同时写入控制台和日志文件的?我现在也在运行它,它似乎不工作,但让我玩一下。
  • 看来我忘记了一行:sys.stdout.write(line)。我会将它添加到示例中。
  • 所以它并没有完全失败,因为标准输出仍然正确写入我的输出文件。但是,日志和控制台是空白的,它会抛出一个错误:IOError: close() called during concurrent operation on the same file object。线程 Thread-3 中的异常:ValueError:对已关闭文件的 I/O 操作,并且该线程中引发错误的行是:log_fp.write(line)。您是否测试了这些代码的变体(只是好奇)?
  • 就像您的原始帖子一样,它是一个无法运行的示例。如果您想发布一个运行示例,我很乐意尝试。如果您在加入编写器线程之前关闭logfile,您将收到您描述的错误。我很惊讶你在控制台上看不到任何东西。
  • 嘿,我添加了一个您的代码的玩具示例,以显示它遇到的错误。它只是运行一些基本的 unix 命令,并且为了简单起见,所有这些命令都是通过 stdout 而不是通过 stdout 和 stderr 运行的。请参阅我对上述问题的编辑。
猜你喜欢
  • 2021-12-17
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-02-16
  • 2023-04-06
  • 2012-06-04
  • 1970-01-01
相关资源
最近更新 更多