【问题标题】:Snakemake use all samples as one input with porechopSnakemake 使用所有样本作为 porechop 的一个输入
【发布时间】:2021-07-23 16:19:23
【问题描述】:

我正在尝试通过 Snakemake 工作流程对多个数据使用 porechop。

在我的 Snakefile 中,除了 all 规则之外,还有三个规则,一个 fastqc 规则和一个 porechop 规则。 fastqc 规则效果很好,我的三个 fastq 都用完了。但是对于 porechop,它不是运行命令三次,而是同时对所有三个文件使用 -i 标志运行一次命令:

Error in rule porechop:
    jobid: 2
    output: /ngs/prod/nanocea_project/test/prod/porechop/25022021_2_pore.fastq.gz, /ngs/prod/nanocea_project/test/prod/porechop/02062021_1_pore.fastq.gz, /ngs/prod/nanocea_project/test/prod/porechop/02062021_2_pore.fastq.gz
    conda-env: /ngs/prod/nanocea_project/test/.snakemake/conda/a72fb141b37718b7c37d9f32d597faeb
    shell:
        porechop -i /ngs/prod/nanocea_project/test/reads/25022021_2.fastq.gz /ngs/prod/nanocea_project/test/reads/02062021_1.fastq.gz /ngs/prod/nanocea_project/test/reads/02062021_2.fastq.gz -o /ngs/prod/nanocea_project/test/prod/porechop/25022021_2_pore.fastq.gz /ngs/prod/nanocea_project/test/prod/porechop/02062021_1_pore.fastq.gz /ngs/prod/nanocea_project/test/prod/porechop/02062021_2_pore.fastq.gz -t 40 --discard_middle
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

但是,当我将它与单个样本一起使用时,该程序可以正常工作。

这是我的代码:

import glob
import os

###Global Variables###

FORMATS=["zip", "html"]
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"

###FASTQ Files###

def list_samples(DIR_FASTQ):
        SAMPLES=[]
        for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
                base=os.path.basename(file)
                sample=(base.replace('.fastq.gz', ''))
                SAMPLES.append(sample)
        return(SAMPLES)

SAMPLES=list_samples(DIR_FASTQ)

###Rules###
rule all:
        input:
                expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS),
                expand("/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz", sample=SAMPLES)
rule fastqc:
        input:
                expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
        output:
                expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
        threads:
                16
        conda:
                "envs/fastqc.yaml"
        shell:
                "fastqc {input} -o /ngs/prod/nanocea_project/test/stats/fastqc/ -t {threads}"

rule porechop:
        input:
                expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
        output:
                expand("/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz", sample=SAMPLES)
        threads:
                40
        conda:
                "envs/porechop.yaml"
        shell:
                "porechop -i {input} -o {output} -t {threads} --discard_middle"

你知道出了什么问题吗?

谢谢!

【问题讨论】:

    标签: python snakemake


    【解决方案1】:

    这个问题经常出现...如果您在input:output: 中使用expand(),那么您正在为规则提供所有文件的列表。那和写的一样:

    input:
        ['sample1.fastq', 'sample2.fastq', ..., 'sampleN.fastq'],
    output:
        ['sample1.pore.fastq', 'sample2.pore.fastq', ..., 'sampleN.pore.fastq'],
    

    要在每个输入/输出上运行规则,只需删除扩展:

    rule porechop:
        input:
            DIR_FASTQ+"/{sample}.fastq.gz"
        output:               
            "/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz",
    
    

    【讨论】:

      猜你喜欢
      • 2021-10-23
      • 2018-08-07
      • 1970-01-01
      • 2021-05-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-12-29
      相关资源
      最近更新 更多