在 nextflow 中使用 bash 修改 Python 脚本输出答案

【问题标题】：Python script output modification with bash in nextflow在 nextflow 中使用 bash 修改 Python 脚本输出
【发布时间】：2021-07-25 18:22:02
【问题描述】：

我有一个 python 脚本 (make_chunk.py)，它从输入通道获取输入文件并打印 3 个数组。

import pandas as pd
import numpy as np
import os
import sys

data=sys.argv[1]
df=pd.read_csv(data,sep='\t',header=None)
chnk_ult=df[df.columns[3]].max()

chnk_start=np.arange(0,chnk_ult,3000000)
chnk_end=chnk_start+3e6
chnk_arr=np.arange(1,len(chnk_end))
print(chnk_start, chnk_end, chnk_arr)

我想从上面的输出中创建 3 个不同的 bash 数组。在终端是可行的。我想在 nextflow 脚本中使用相同的命令来创建稍后将使用的那些数组。到目前为止，我已经尝试过：

process imputation {
publishDir params.out, mode:'copy'
input:
tuple val(chrom),path(in_haps),path(input_bed),path(refs),path(maps) from imp_ch
output:
tuple("${chrom}"),path("${chrom}.*") into imputed
script:
def (haps,sample)=in_haps
def (bed, bim, fam)=input_bed
def (haplotype, legend, samples)=refs
"""
x="\$(make_chunk.py ${bim})"
eval \$(echo \$x | sed 's|,| |g; s|\\[|list1=(|; s|\\[|list2=(|; s|\\[|list3=(|;s|\\]|)\\n|g;')
start="\$(echo \${list1[@]})"
end="\$(echo \${list2[@]})"
chunks="\$(echo \${list3[@]})"
impute4 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.step10.imputed.chunk\${chunks}" -no_maf_align -o_gz -int \${start[\${chunks}]} \${end[\${chunks}]} -Ne 20000 -buffer 1000 -seed 54321
"""
}

对于上面的 nextflow 过程，我收到以下错误：

Command error: .command.sh: line 7: 0 1 2 3 4 5 6: syntax error in expression (error token is "1 2 3 4 5 6"

但在 bash 终端中，这些命令可以正常工作。这件事有什么帮助吗？

【问题讨论】：

我在您发布的一个程序中看到了一个看起来像 Python 的程序，以及一个我不认识该语言的程序 - 至少在我看来它不像 bash。说明您如何命名这两个文件，以及如何调用产生错误的命令。
我编辑了指定脚本名称的问题。第二个是 nextflow 脚本。 bash 命令已针对 nextflow 语法进行了修改。
请指出哪一行产生了错误。

标签： python arrays bash nextflow

【解决方案1】：

如果您的 bimfile 只是一个空格分隔的文件，请使用 nextflow operator 拆分此类文件：

【讨论】：