当上游规则失败时，Snakemake 如何执行下游规则答案

【问题标题】：Snakemake how to execute downstream rules when an upstream rule fails当上游规则失败时，Snakemake 如何执行下游规则
【发布时间】：2020-03-21 04:54:45
【问题描述】：

很抱歉标题不好 - 我想不出如何用几句话来最好地解释我的问题。当其中一条规则失败时，我在处理 snakemake 中的下游规则时遇到问题。在下面的示例中，规则黑桃在某些样本上失败。这是意料之中的，因为我的一些输入文件会有问题，黑桃会返回错误，并且没有生成目标文件。这很好，直到我开始统治 eval_ani。在这里，我基本上想对所有成功输出的规则 ani 运行此规则。但我不确定如何做到这一点，因为我已经有效地丢弃了一些我的样本。我认为使用snakemake 检查点可能很有用，但我只是无法从文档中弄清楚如何应用它。

我还想知道是否有一种方法可以在不重新运行规则黑桃的情况下重新运行规则 ani。假设我提前终止了我的运行，并且规则 ani 没有在所有样本上运行。现在我想重新运行我的管道，但我不希望 snakemake 尝试重新运行所有失败的黑桃作业，因为我已经知道它们对我没有用，而且只会浪费资源。我尝试了 -R 和 --allowed-rules 但这些都不是我想要的。

rule spades:
    input:
        read1=config["fastq_dir"]+"combined/{sample}_1_combined.fastq",
        read2=config["fastq_dir"]+"combined/{sample}_2_combined.fastq"
    output:
        contigs=config["spades_dir"]+"{sample}/contigs.fasta",
        scaffolds=config["spades_dir"]+"{sample}/scaffolds.fasta"
    log:
        config["log_dir"]+"spades/{sample}.log"
    threads: 8
    shell:
        """
        python3 {config[path_to_spades]} -1 {input.read1} -2 {input.read2} -t 16 --tmp-dir {config[temp_dir]}spades_test -o {config[spades_dir]}{wildcards.sample} --careful > {log} 2>&1
        """

rule ani:
    input:
        config["spades_dir"]+"{sample}/scaffolds.fasta"
    output:
        "fastANI_out/{sample}.txt"
    log:
        config["log_dir"]+"ani/{sample}.log"
    shell:
        """
        fastANI -q {input} --rl {config[reference_dir]}ref_list.txt -o fastANI_out/{wildcards.sample}.txt
        """

rule eval_ani:
    input:
        expand("fastANI_out/{sample}.txt", sample=samples)
    output:
        "ani_results.txt"
    log: 
        config["log_dir"]+"eval_ani/{sample}.log"
    shell:
        """
            python3 ./bin/evaluate_ani.py {input} {output} > {log} 2>&1
        """

【问题讨论】：

标签： snakemake

【解决方案1】：

如果我理解正确，您希望在不停止整个管道的情况下允许 spades 失败，并且您希望忽略失败的 spades 的输出文件。为此，您可以附加到运行 spades || true 的命令以捕获非零退出状态（因此snakemake 不会停止）。然后，您可以分析黑桃的输出并写入“标志”文件，无论该样本是否成功。所以黑桃规则是这样的：

rule spades:
    input:
        read1=config["fastq_dir"]+"combined/{sample}_1_combined.fastq",
        read2=config["fastq_dir"]+"combined/{sample}_2_combined.fastq"
    output:
        contigs=config["spades_dir"]+"{sample}/contigs.fasta",
        scaffolds=config["spades_dir"]+"{sample}/scaffolds.fasta",
        exit= config["spades_dir"]+'{sample}/exit.txt',
    log:
        config["log_dir"]+"spades/{sample}.log"
    threads: 8
    shell:
        """
        python3 {config[path_to_spades]} ... || true
        # ... code that writes to {output.exit} stating whether spades succeded or not 
        """

对于以下步骤，您使用标志文件'{sample}/exit.txt' 来决定应该使用哪些铲形文件，哪些应该丢弃。例如：

rule ani:
    input:
        spades= config["spades_dir"]+"{sample}/scaffolds.fasta",
        exit= config["spades_dir"]+'{sample}/exit.txt',
    output:
        "fastANI_out/{sample}.txt"
    log:
        config["log_dir"]+"ani/{sample}.log"
    shell:
        """
        if {input.exit} contains 'PASS':
            fastANI -q {input.spades} --rl {config[reference_dir]}ref_list.txt -o fastANI_out/{wildcards.sample}.txt
        else:
            touch {output}
        """
        
rule eval_ani:
    input:
        ani= expand("fastANI_out/{sample}.txt", sample=samples),
        exit= expand(config["spades_dir"]+'{sample}/exit.txt', sample= samples),
    output:
        "ani_results.txt"
    log: 
        config["log_dir"]+"eval_ani/{sample}.log"
    shell:
        """
        # Parse list of file {input.exit} to decide which files in {input.ani} should be used
        python3 ./bin/evaluate_ani.py {input} {output} > {log} 2>&1
        """

编辑 （未测试）在shell 指令中，使用run 指令并使用python 的subprocess 可能会更好。运行允许失败的系统命令。原因是|| true无论发生什么错误都会返回0退出码； subprocess 解决方案允许更精确地处理异常。例如

rule spades:
    input:
        ...
    output:
        ...
    run:
        cmd = "spades ..."
        p = subprocess.Popen(cmd, shell= True, stdout= subprocess.PIPE, stderr= subprocess.PIPE)
        stdout, stderr= p.communicate()

        if p.returncode == 0:
            print('OK')
        else:
            # Analyze exit code and stderr and decide what to do next
            print(p.returncode)
            print(stderr.decode())

【讨论】：

我还没有尝试过，但它似乎是有道理的。然后在我的规则中，我可以将所有内容作为输入“output.exit”而不是黑桃输出，所以我已经存储了黑桃作业已经运行但失败的地方，如果我执行更多管道，snakemake 将不会尝试再次运行它们不止一次。
我认为这完全有道理@dariober。作为一般规则，我猜|| 可以与任何我们确信不会失败的命令结合使用。我在想如果第一个命令不起作用，可以利用此功能运行“计划 B”。这是正确的吗？
@Geparada 我认为您运行 B 计划是正确的，但请参阅我对答案的编辑。基本上，我认为最好谨慎使用|| true 以避免忽略真正的错误。（我上面的子流程解决方案可能会得到改进）
这很酷。我不知道 subprocess 库，避免在 bash 中编写脚本看起来非常有用。最后一个问题：你知道在run 内是否可以从Snakefile 的全局范围内访问对象吗？例如在规则spades 之前定义的任何变量？非常感谢您的更新。