【发布时间】:2020-04-17 19:11:40
【问题描述】:
我正在使用 Snakemake 编写管道,但程序无法识别规则 stringtie。我找不到我做错了什么。我已经运行了 fastp 和 star 规则,问题是 stringtie 规则所特有的。
include:
'config.py'
rule all:
input:
expand(FASTP_DIR + "{sample}R{read_no}.fastq",sample=SAMPLES ,read_no=['1', '2']), #fastp
expand(STAR_DIR + STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam",sample=SAMPLES), #STAR
expand(STRINGTIE_DIR + "/{sample}/{sample}Aligned.sortedByCoord.out.gtf", sample=SAMPLES),
GTF_DIR + "path_samplesGTF.txt"
rule fastp:
input:
R1= DATA_DIR + "{sample}R1_001.fastq.gz",
R2= DATA_DIR + "{sample}R2_001.fastq.gz"
output:
R1out= FASTP_DIR + "{sample}R1.fastq",
R2out= FASTP_DIR + "{sample}R2.fastq"
params:
data_dir = DATA_DIR,
name_sample = "{sample}"
log: FASTP_LOG + "{sample}.html"
message: "Executando o programa FASTP"
run:
shell('fastp -i {input.R1} -I {input.R2} -o {output.R1out} -O {output.R2out} \
-h {log} -j {log}')
shell("find {params.data_dir} -type f -name '{params.name_sample}*' -delete ")
rule star:
input:
idx_star = IDX_DIR,
R1 = FASTP_DIR + "{sample}R1.fastq",
R2 = FASTP_DIR + "{sample}R2.fastq",
parameters = "parameters.txt",
params:
outdir = STAR_DIR + "output/{sample}/{sample}",
star_dir = STAR_DIR,
star_sample = '{sample}'
# threads: 18
output:
out = STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam"
#run_time = STAR + "log/star_run.time"
# log: STAR_LOG
# benchmark: BENCHMARK + "star/{sample_star}"
run:
shell("STAR --runThreadN 12 --genomeDir {input.idx_star} \
--readFilesIn {input.R1} {input.R2} --outFileNamePrefix {params.outdir}\
--parametersFiles {input.parameters} \
--quantMode TranscriptomeSAM GeneCounts \
--genomeChrBinNbits 12")
# shell("find {params.star_dir} -type f ! -name
'{params.star_sample}Aligned.sortedByCoord.out.bam' -delete")
rule stringtie:
input:
star_output = STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam"
output:
stringtie_output = STRINGTIE_DIR + "/{sample}/{sample}Aligned.sortedByCoord.out.gtf"
run:
shell("stringtie {input.star_output} -o {output.stringtie_output} \
-v -p 12 ")
rule grep_gtf:
input:
list_gtf = STRINGTIE_DIR
output:
paths = GTF_DIR + "path_samplesGTF.txt"
shell:
"find {input.list_gtf} | grep .gtf > {output.paths}"
这是我使用选项 dry-run (flag -n) 得到的输出
Building DAG of jobs...
Job counts:
count jobs
1 all
1 grep_gtf
2
[Fri Apr 17 15:59:24 2020]
rule grep_gtf:
input: /homelocal/boralli/workdir/pipeline_v4/STRINGTIE/
output: /homelocal/boralli/workdir/pipeline_v4/GTF/path_samplesGTF.txt
jobid: 1
find /homelocal/boralli/workdir/pipeline_v4/STRINGTIE/ | grep .gtf >
/homelocal/boralli/workdir/pipeline_v4/GTF/path_samplesGTF.txt
[Fri Apr 17 15:59:24 2020]
localrule all:
input: /homelocal/boralli/workdir/pipeline_v4/GTF/path_samplesGTF.txt
jobid: 0
Job counts:
count jobs
1 all
1 grep_gtf
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
我真的不知道发生了什么。相同的管道以前工作过。
【问题讨论】:
-
根据您的全部规则日志,所有扩展都不会生成文件。您能否确认 SAMPLES 不为空(包含后为
print(SAMPLES))并且文件在之前的运行中不存在? -
谢谢你,特洛伊。在我的脚本中,我输入了一个命令来删除我的原始文件以释放空间,并且我的脚本使用原始文件获取样本的名称。现在我需要编辑一些行并再次执行。