【问题标题】:snakemake - define input for aggregate rule without wildcardssnakemake - 为没有通配符的聚合规则定义输入
【发布时间】:2023-03-13 04:00:01
【问题描述】:

我正在编写一个 snakemake,以通过 Nanopore 测序产生 Sars-Cov-2 变体。我写的管道是基于artic network,所以我使用artic guppyplexartic minion

我写的snakemake有以下步骤:

  1. 压缩所有条形码的所有fastq 文件(规则zipFq
  2. 使用guppyplex(规则guppyplex)执行读取过滤
  3. 调用artic minion管道(rule minion
  4. 将 stderr 和 stdout 从 qsub 移动到工作目录下的文件夹(规则 mvQsubLogs

下面是我目前写的snakemake,它有效

barcodes = ['barcode49', 'barcode50', 'barcode51']

rule all:
    input:
        expand([
            # zip fq
            "zipFastq/{barcode}/{barcode}.zip",

            # guppyplex
            "guppyplex/{barcode}/{barcode}.fastq",

            # nanopolish
            "nanopolish/{barcode}",

            # directory where the logs will be moved to    
            "logs/{barcode}"
        ], barcode = barcodes)

rule zipFq:
    input: 
        FQ = f"{FASTQ_PATH}/{{barcode}}"
    output:
        "zipFastq/{barcode}/{barcode}.zip"
    shell:
        "zip {output} {input.FQ}/*"


rule guppyplex:
    input:
        FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
    output:
        "guppyplex/{barcode}/{barcode}.fastq"
    shell:
        "/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml


rule minion:
    input:
        INFQ = rules.guppyplex.output,
        FAST5 = f"{FAST5_PATH}/{{barcode}}"
    params:
        OUTDIR = "nanopolish/{barcode}"
    output:
        directory("nanopolish/{barcode}")
    shell:
        """
        mkdir {params.OUTDIR};
        cd {params.OUTDIR};
        export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
        artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5}  nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
        """

rule mvQsubLogs:
    input:
        # zipFQ
        rules.zipFq.output,

        # guppyplex
        rules.guppyplex.output,

        # nanopolish
        rules.minion.output
    output:
        directory("logs/{barcode}")
    shell:
        "mkdir -p {output} \n"
        "mv {LOGDIR}/{wildcards.barcode}* {output}/"

上面的snakemake有效,现在我正在尝试添加另一个规则,但这里的不同之处在于该规则是一个聚合函数,即不应为每个barcode调用它,而是在调用所有规则之后只调用一次适用于所有条码

我试图合并的规则 (catFasta) 将 cat all {barcode}.consensus.fasta (由规则 minion 生成) 合并到一个文件中,如下所示(合并到上面的 snakemake 中):

barcodes = ['barcode49', 'barcode50', 'barcode51']

rule all:
    input:
        expand([
            # zip fq
            "zipFastq/{barcode}/{barcode}.zip",

            # guppyplex
            "guppyplex/{barcode}/{barcode}.fastq",

            # nanopolish
            "nanopolish/{barcode}",
            
            # catFasta
            "catFasta/cat_consensus.fasta",

            # directory where the logs will be moved to    
            "logs/{barcode}"
        ], barcode = barcodes)

rule zipFq:
    input: 
        FQ = f"{FASTQ_PATH}/{{barcode}}"
    output:
        "zipFastq/{barcode}/{barcode}.zip"
    shell:
        "zip {output} {input.FQ}/*"


rule guppyplex:
    input:
        FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
    output:
        "guppyplex/{barcode}/{barcode}.fastq"
    shell:
        "/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml


rule minion:
    input:
        INFQ = rules.guppyplex.output,
        FAST5 = f"{FAST5_PATH}/{{barcode}}"
    params:
        OUTDIR = "nanopolish/{barcode}"
    output:
        directory("nanopolish/{barcode}")
    shell:
        """
        mkdir {params.OUTDIR};
        cd {params.OUTDIR};
        export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
        artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5}  nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
        """

rule catFasta:
    input:
        expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
    output:
        "catFasta/cat_consensus.fasta"
    shell:
        "cat {input} > {output}"

rule mvQsubLogs:
    input:
        # zipFQ
        rules.zipFq.output,

        # guppyplex
        rules.guppyplex.output,

        # nanopolish
        rules.minion.output,

        # catFasta
        rules.catFasta.output
    output:
        directory("logs/{barcode}")
    shell:
        "mkdir -p {output} \n"
        "mv {LOGDIR}/{wildcards.barcode}* {output}/"

但是,当我用

调用snakemake时
(artic-ncov2019) ngs@bngs05b:/nexusb/SC2/ONT/scripts/SnakeMake> snakemake -np -s Snakefile_v2 --cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}" all # LOGDIR parsed from config.yaml

我明白了:

Building DAG of jobs...
MissingInputException in line 178 of /nexusb/SC2/ONT/scripts/SnakeMake/Snakefile_v2:
Missing input files for rule guppyplex:
/nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta

我觉得这不容易理解:snakemake 抱怨 /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta/nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/FASTQ_PATH,我没有在任何地方定义 f"{FASTQ_PATH}/{{barcode}}.consensus.fasta"

here 描述了一个非常相同的问题,尽管接受答案中的策略(规则 catFasta 的输入将是 expand("nanopolish/{{barcode}}/{{barcode}}.consensus.fasta"))对我不起作用。

有谁知道我该如何规避这个问题?

【问题讨论】:

    标签: snakemake


    【解决方案1】:

    失败的规则是rule guppyplex,它以{FASTQ_PATH}/{{barcode}} 的形式查找输入。

    看起来通配符{barcode} 填充了barcode49/barcode49.consensus.fasta,我认为这是因为两个原因:

    首先(也是最重要的):工作流没有找到更好的方法来生成最终输出。在rule catFasta 中,您提供了一个输入文件,该文件在您的工作流程中从未被描述为输出。 rule minion 将目录作为输出,但没有文件,并且对于生成此输入文件的工作流来说并不完全清楚。

    因此推断{barcode} 通配符必须包含它以前从未见过的.consensus.fasta。然后将此通配符移交给顶部,工作流在顶部崩溃,因为它找不到匹配的输入文件。

    第二:这个通配符的初始化。您不想要的唯一可能是因为您没有正确限制通配符。例如,您可以禁止通配符包含.(参见wildcard_constraints here

    但是,主要问题是catFasta 没有找到所需的输入。我建议将minion 的输出更改为"nanopolish/{barcode}/{barcode}.consensus.fasta",因为您已经从参数中获取了 OUTDIR,这不应该损害您的规则。

    编辑:虚拟测试示例:

    barcodes = ['barcode49', 'barcode50', 'barcode51']
    
    rule all:
        input:
            expand([
                # guppyplex
                "guppyplex/{barcode}/{barcode}.fastq",
    
                # catFasta
                "catFasta/cat_consensus.fasta",
            ], barcode = barcodes)
    
    
    rule guppyplex:
        input:
            FQ = f"fastq/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
        output:
            "guppyplex/{barcode}/{barcode}.fastq"
        shell:
            "touch {output}" # variables in CAPITALS are parsed from config.yaml
    
    
    rule minion:
        input:
            INFQ = rules.guppyplex.output,
            FAST5 = f"fasta/{{barcode}}"
        params:
            OUTDIR = "nanopolish/{barcode}"
        output:
            "nanopolish/{barcode}/{barcode}.consensus.fasta"
        shell:
            """
            touch {output} && echo {wildcards.barcode} > {output}
            """
    
    rule catFasta:
        input:
            expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
        output:
            "catFasta/cat_consensus.fasta"
        shell:
            "cat {input} > {output}"
    

    【讨论】:

    • 感谢您的回复,我已经更改了minion 规则的输出(output: "nanopolish/{barcode}/{barcode}.consensus.fasta"rule all 相同)并且我收到相同的错误AttributeError: 'Wildcards' object has no attribute 'barcode'。试运行后我没有收到任何错误,我得到了我需要打印的确切命令,尽管它没有显示通配符
    • 我刚刚看到你在expand() 中写了barcode = barcodes,但我没有看到这个barcodes 列表。我想你把它命名为samples
    • 是的,抱歉打错了,实际上我希望蛇形程序遍历一个名为barcodes的简单python列表@
    • 我从试运行中得到的命令是cat nanopolish/barcode94/barcode94.consensus.fasta nanopolish/barcode95/barcode95.consensus.fasta nanopolish/barcode96/barcode96.consensus.fasta > catFasta/cat_consensus.fasta,这正是我需要的命令。错误消息是AttributeError: 'Wildcards' object has no attribute 'barcode'
    • 也许您可以将您的代码与您的答案分享,以便我看看您做了哪些不同的事情?
    猜你喜欢
    • 2020-12-15
    • 2021-06-11
    • 2021-12-03
    • 2022-08-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-29
    • 2021-09-29
    相关资源
    最近更新 更多