【发布时间】:2020-02-27 15:56:26
【问题描述】:
我正在尝试计算每行的计数是否超过某个值,即总计数的 30%。
在 for 循环中,我获得了 awk '$1=($1/100)*30' ${i}_counts > ${i}_percentage-value 中的百分比,这是一个数字,输出仅包含该数字。
如何针对${i}_percentage-value 对${i}_counts 的每一行进行“值大于”的计算?
也就是说,如何将文件中的数字作为数值进行数学运算?
数据:
data.csv(摘录)
SampleID ASV Count
1000A ASV_1216 14
1000A ASV_12580 150
1000A ASV_12691 260
1000A ASV_135 434
1000A ASV_147 79
1000A ASV_15 287
1000A ASV_16 361
1000A ASV_184 8
1000A ASV_19 42
samples-ID-short
1000A
1000B
1000C
因此,对于每个样品 ID,都有很多 ASV,数量可能相差很大,例如 1000A 为 50 ASV,1000B 为 120 等等。每个 ASV_## 都有一个计数,我的代码用于计算计数总和,然后找出每个样本的 30% 值,报告哪个 ASV_## 大于 30%。最终,它应该报告 30% 的 1。
到目前为止,这是我的代码:
for i in $(cat samplesID-short)
do
grep ${i} data.csv | cut -d , -f3 - > ${i}_count_sample
grep ${i} data.csv | cut -d , -f2 - > ${i}_ASV
awk '{ sum += $1; } END { print sum; }' ${i}_count_sample > ${i}_counts
awk '$1=($1/100)*30' ${i}_counts > ${i}_percentage-value
#I was thinking about replicate the numeric value for the entire column and make the comparison "greater than", but the repetition times depend on the ASV counts for each sample, and they are always different.
wc -l ${i}_ASV > n
for (( c=1; c<=n; c++)) ; do echo ${i}_percentage-value ; done
paste <(sed 's/^[[:blank:]]*//' ${i}_ASV) ${i}_count_sample ${i}_percentage-value > ${i}_tmp;
awk 'BEGIN{OFS="\t"}{if($2 >= $3) print $1}' ${i}_tmp > ${i}_is30;
#How the output should be:
paste <(sed 's/^[[:blank:]]*//' ${i}_ASV) ${i}_count_sample ${i}_counts ${i}_percentage-value ${i}_is30 > ${i}_summary_nh
echo -e "ASV_ID\tASV_in_sample\ttotal_ASVs_inSample\ttreshold_for_30%\tASV_over30%" | cat - ${i}_summary_nh > ${i}_summary
rm ${i}_count_sample ${i}_counts ${i}_percentage-value ${i}_ASV ${i}_summary_nh ${i}_is30
done &
【问题讨论】: