【发布时间】:2021-12-29 14:39:21
【问题描述】:
我已经阅读了一些具有相同主题的其他问题,但我根本无法解决它......
当我有第二个文件时,我正在尝试连接一些文件,如果没有,什么也不做...表:
name | path | path2
554_MO_GEM12_r070 | data/171219_NB501241_0070_AHCHYNBGX5/fastq/554_MO_GEM12_r070_S5_R1_001.fastq.gz |
693_SP_GEM12_r070 | data/171219_NB501241_0070_AHCHYNBGX5/fastq/693_SP_GEM12_r070_S21_R1_001.fastq.gz | data/200914_NB501241_0451_AHFNHMBGXG/fastq/693_MO_reseq70_r451_S1_R1_001.fastq.gz
866_MO_GEM12_r070 | data/171219_NB501241_0070_AHCHYNBGX5/fastq/866_MO_GEM12_r070_S10_R1_001.fastq.gz |
708_MO_GEM12_r070 | data/171219_NB501241_0070_AHCHYNBGX5/fastq/708_MO_GEM12_r070_S9_R1_001.fastq.gz | data/180201_NB501241_0088_AHJ2GHBGX5/fastq/708_MO_GEM12_reseq070_r088_S5_R1_001.fastq.gz
这是(简化的)Snakefile...
import os
import pandas as pd
import subprocess
### loading samples
SAMPLES = pd.read_csv("prueba_snakemake.csv")
SAMPLES.name = SAMPLES.name.astype(str)
SAMPLES = SAMPLES.set_index("name")
### including rules
include: "rules/testings.smk"
rule all:
input:
expand(["data/processed/{sample}.test"], sample=SAMPLES.index)
...这是testings.smk:
def concatenate_fastq(sample, sample_df):
res_file = f"data/processed/{sample}_concatenated.fastq.gz"
if not os.path.isfile(res_file):
cmd = f"cat {[sample_df['path']][0]} {[sample_df['path2']][0]} > {res_file}"
subprocess.run(cmd, shell=True)
return [res_file]
def get_fastq_files(wildcards):
sample_df = SAMPLES.loc[wildcards.sample]
if pd.isna(sample_df["path2"]):
reads = [sample_df["path"]]
else:
reads = concatenate_fastq(wildcards.sample, sample_df)
print(reads)
return reads
rule test_rule:
input:
reads = get_fastq_files
output:
"data/processed/{sample}.test"
shell:
"touch {output}"
但有些事情并不顺利,
- 如果不执行连接,则不会生成任何文件(预计会生成一些 touch 文件),
- 新的连接文件已正确存储在文件夹中,但
rule all未检测到 (?):
Building DAG of jobs...
['data/RUNs/171219_NB501241_0070_AHCHYNBGX5/fastq/554_MO_GEM12_r070_S5_R1_001.fastq.gz']
['data/processed/693_SP_GEM12_r070_concatenated.fastq.gz']
MissingInputException in line 18 of snake_flow/workflows/rules/trimming.smk:
Missing input files for rule test_rule:
data/processed/693_SP_GEM12_r070_concatenated.fastq.gz
我认为不是通配符问题,也不是不同的输出路径。知道我错过了什么吗?谢谢。
【问题讨论】:
-
您在函数中进行连接,该函数仅用于返回运行管道之前应该存在的文件的文件名。
标签: python concatenation snakemake