【发布时间】:2016-08-10 08:16:22
【问题描述】:
我正在尝试创建一个工作流,其中我获取一个包含输入文件列表的目录并通过命令行工具运行它们并将结果输出到输出目录中。它应该很简单,而且我已经让它工作了......主要是。
问题是,每当我给它一个输入 directory 时,即使我 100% 确定,我也会收到“跳过不存在或无法读取的文件”的错误这些文件存在于我的输入目录中。
但是,如果我稍微修改一下代码,并让它只提供一个输入 文件 而不是一个目录,那么脚本会按照应有的方式运行并完美完成。
我的输入文件已压缩。
这是脚本:
import argparse
import subprocess
import os
parser = argparse.ArgumentParser(description="A RNAseq pipeline for pair-end data")
parser.add_argument("-i", "--inputDir", help="A input directory containing your gzipped fastq files", required=True)
parser.add_argument("-o", "--outputDir", help="Output directory", required=True)
parser.parse_args()
### Define global variables
args = parser.parse_args()
inputDir = args.inputDir
outputDir = args.outputDir
### Grab all fastq files in input directory
fastq_directory = os.listdir("{}".format(inputDir))
fastq_files = []
for file in fastq_directory:
fastq_files.append(file)
### Run FastQC
for file in fastq_files:
fastqc_command = "fastqc --extract -o {} {}".format(outputDir, file)
subprocess.check_output(['bash', '-c', fastqc_command])
错误:
Skipping 'KO1_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO1_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO2_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO2_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO3_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO3_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT1_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT1_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT2_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT2_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT3_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT3_R2.fastq.gz' which didn't exist, or couldn't be read
有什么建议吗?
PS:我知道剧本很糟糕,但我正在学习:)。虽然建议绝对欢迎!
【问题讨论】:
-
fastqc 是否与文件在同一目录下可执行?如果不是,我怀疑错误是您只提供文件名(即
KO1_R1.fastq.gz)而不是绝对路径(即/Documents/name/KO1_R1.fastq.gz)
标签: python workflow bioinformatics