【发布时间】:2015-03-31 14:47:57
【问题描述】:
我正在尝试使用子进程在 python 中运行一系列现有脚本。当我使用此代码时,链按预期工作:
p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE)
p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()
p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out)
p2.stdout.close()
p3.communicate()
file_out.close()
输出如下:
100000 GFF lines processed.
[bam_sort_core] merging from 2 files...
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2600000 GFF lines processed.
2700000 GFF lines processed.
2764635 GFF lines processed.
100000 SAM alignment records processed.
200000 SAM alignment records processed.
300000 SAM alignment records processed.
400000 SAM alignment records processed.
500000 SAM alignment records processed.
600000 SAM alignment records processed.
700000 SAM alignment records processed.
800000 SAM alignment records processed.
900000 SAM alignment records processed.
1000000 SAM alignment records processed.
1100000 SAM alignment records processed.
1200000 SAM alignment records processed.
1300000 SAM alignment records processed.
1400000 SAM alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1700000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2300000 SAM alignment records processed.
2400000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2800000 SAM alignment records processed.
2900000 SAM alignment records processed.
所有这些输出都来自 stderr,我希望能够将其写入终端和日志文件。为了实现这一点,我使用 unix tee 命令作为 python 中的子进程,并从上一个子进程命令中将其传递给 stderr。代码如下所示:
p1 = subprocess.Popen(samtoolsSortArguments, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
tee = subprocess.Popen(['tee', logfile], stdin=p1.stderr)
p1.stderr.close()
p2 = subprocess.Popen(samtoolsViewArguments, stdin=p1.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p1.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p2.stderr)
p2.stderr.close()
p3 = subprocess.Popen(htseqCountArguments, stdin=p2.stdout, stdout=file_out, stderr=subprocess.PIPE)
p2.stdout.close()
tee = subprocess.Popen(['tee', logfile], stdin=p3.stderr)
p3.communicate()
p3.stderr.close()
tee.communicate()
file_out.close()
此代码中写入我的 file_out 句柄的 stdout 输出是正确的。甚至打印到屏幕和日志文件的标准错误似乎也是正确的信息。但是,stderr 的输出在某些行被截断,我不知道为什么。这是我的日志文件和终端的样子(它们匹配):
GFF lines processed.
[bam_sort_core] merging from 2 files...
GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
GFF lines processed.
GFF lines processed.
FF lines processed.
F lines processed.
lines processed.
ines processed.
700000 GFF lines processed.
2764635 GFF lines processed.
nt records processed.
records processed.
300000 SAM alignment records processed.
cords processed.
ds processed.
processed.
essed.
d.
000000 SAM alignment records processed.
00 SAM alignment records processed.
alignment records processed.
1500000 SAM alignment records processed.
1600000 SAM alignment records processed.
1800000 SAM alignment records processed.
1900000 SAM alignment records processed.
2000000 SAM alignment records processed.
2100000 SAM alignment records processed.
2200000 SAM alignment records processed.
2500000 SAM alignment records processed.
2600000 SAM alignment records processed.
2700000 SAM alignment records processed.
2900000 SAM alignment records processed.
为什么传递给 tee 的输出会被截断?这只是一个列移位吗?有没有办法解决这个问题,还是我只是想对子流程做太多事情?
编辑: 这是@tdelaney 代码的SSCCE。它重现了我在更广泛的上下文中使用它时遇到的相同错误。此示例应从包含名为 test.txt 的文件的文件夹中运行。 test.txt 应如下所示(或任何类似的内容,只要某些行是“测试”):
test
not
test
这是玩具代码(确保将 shebang 更改为指向您的 python):
#!/usr/local/bin/python2
import sys
import subprocess
import threading
logfile = "./testlog.txt"
arg1 = ["ls", "-l"]
arg2 = ["find", "-name", "test.txt"]
arg3 = ["xargs", "grep", "-i", "-n", "test"]
def log_writer(pipe, log_fp, lock):
for line in pipe:
with lock:
log_fp.write(line)
sys.stdout.write(line)
with open(logfile, 'w') as log_fp:
lock = threading.Lock()
threads = []
p1 = subprocess.Popen(arg1, stdout=subprocess.PIPE)
threads.append(threading.Thread(target=log_writer, args=(p1.stdout, log_fp, lock)))
p2 = subprocess.Popen(arg2, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()
threads.append(threading.Thread(target=log_writer, args=(p2.stdout, log_fp, lock)))
p3 = subprocess.Popen(arg3, stdin=p2.stdout, stdout=subprocess.PIPE)
p2.stdout.close()
threads.append(threading.Thread(target=log_writer, args=(p3.stdout, log_fp, lock)))
for t in threads:
t.start()
p3.communicate()
for t in threads:
t.join()
注意:如果我注释掉 close() 和communicate() 行,代码就会运行。我有点担心这样做,但从那以后我将在更广泛的背景下遇到各种其他问题。
【问题讨论】:
-
如果你想在同一个进程中从管道中读取数据,不要调用
p1.stdout.close()、p2.stdout.close()、p3.communicate()(你在代码中使用线程)。使用for p in [p1, p2, p3]: p.wait()来获取子进程。在log_writer()的顶部添加with pipe以在EOF 或错误时关闭管道。见Python subprocess get children's output to file and terminal?
标签: python subprocess tee