【发布时间】:2022-01-19 10:15:04
【问题描述】:
我需要连接在Snakefile 中使用通配符创建的某些目录中的某些文件。我尝试创建以下规则来连接这些目录中的所有文件:
# concatenate output per hmm
rule concatenate:
input:
output_{hmm}/* ,
output:
output_{hmm}/cat_{hmm}.txt,
params:
cmd='cat'
shell:
'{params.cmd} {input} > {output} '
它不起作用并产生以下错误:
"SyntaxError in line 62 of /scratch/data1/agalvez/domains/Snakefile_ecdf:
invalid syntax (Snakefile_ecdf, line 62)"
我不知道该规则有什么问题,我想* 的使用可能不够,但我想不出另一种方法来做我打算做的事情。
编辑: 这个问题可能缺少一些信息,所以我也会附上完整的 Snakefile:
ARCHIVE_FILE = 'output.tar.gz'
# a single output file
OUTPUT_FILE = 'output_{hmm}/{species}_{hmm}.out'
# a single input file
INPUT_FILE = 'proteins/{species}.fasta'
# a single hmm file
HMM_FILE = 'hmm/{hmm}.hmm'
# a single cat file
CAT_FILE = 'cat/cat_{hmm}.txt'
# Build the list of input files.
INP = glob_wildcards(INPUT_FILE).species
# Build the list of hmm files.
HMM = glob_wildcards(HMM_FILE).hmm
# The list of all output files
OUT = expand(OUTPUT_FILE, species=INP, hmm=HMM)
# The list of all CAT files
CAT = expand(CAT_FILE, hmm=HMM)
# pseudo-rule that tries to build everything.
# Just add all the final outputs that you want built.
rule all:
input: ARCHIVE_FILE
# hmmsearch
rule hmm:
input:
species=INPUT_FILE ,
hmm=HMM_FILE
output:
OUTPUT_FILE,
params:
cmd='hmmsearch --noali -E 99 --tblout'
shell:
'{params.cmd} {output} {input.hmm} {input.species} '
# concatenate output per hmm
from glob import glob
rule concatenate:
input:
files = glob("output_{hmm}/*") ,
output:
CAT_FILE,
params:
cmd='cat'
shell:
'{params.cmd} {input.files} {output} '
# create an archive with all results
rule create_archive:
input: OUT, CAT,
output: ARCHIVE_FILE
shell: 'tar -czvf {output} {input}'
【问题讨论】:
标签: python shell glob snakemake