使用 awk/sed 读取列数答案

【问题标题】：Read the number of columns using awk/sed使用 awk/sed 读取列数
【发布时间】：2014-02-03 22:49:24
【问题描述】：

我有以下测试文件

Kmax Event File - Text Format
1 4 1000 
65 4121 9426 12312 
56 4118 8882 12307 
1273 4188 8217 12309 
1291 4204 8233 12308 
1329 4170 8225 12303 
1341 4135 8207 12306 
63 4108 8904 12300 
60 4106 8897 12307 
731 4108 8192 12306 
...
ÿÿÿÿÿÿÿÿ

在这个文件中，我想删除前两行并应用一些数学计算。例如每一列i 将是$i-(i-1)*number。执行此操作的脚本如下

#!/bin/bash

if test $1 ; then
   if [ -f $1.evnt ] ; then
      rm -f $1.dat
      sed -n '2p' $1.evnt | (read v1 v2 v3
      for filename in $1*.evnt ; do
         echo -e "Processing file $filename"
         sed '$d' < $filename > $1_tmp
         sed -i '/Kmax/d' $1_tmp
         sed -i '/^'"$v1"' '"$v2"' /d' $1_tmp
         cat $1_tmp >> $1.dat
      done
      v3=`wc -l $1.dat | awk '{print $1}' `
      echo -e "$v1 $v2 $v3" > .$1.dat
      rm -f $1_tmp)
   else
      echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
      echo -e "  Event file $1.evnt doesn't exist  !!!!!!"
      echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
   fi   
else
   echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
   echo -e "!!!!!  Give name for event files  !!!!!"
   echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
awk '{print $1, $2-4096, $3-(2*4096), $4-(3*4096)}' $1.dat >$1_Processed.dat
rm -f $1.dat
exit 0

文件不会总是有 4 列。有没有办法读取列数、打印此数字并应用这些计算？

EDIT 这个想法是有一个输入文件 (*.evnt)，将其转换为 *.dat 或任何其他仅包含数字的 ascii 文件（实际上并不重要）在列中，然后应用计算$i=$i-(i-1)*number。此外，它将保留变量中的列数，该变量将在另一个程序中调用。例如在上面的文件中，number=4096 和一个示例输出文件如下

65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18

在控制台中我会收到消息There are 4 detectors。

最后会生成一个新的file_processed.dat，其中file是awk输入文件的初始名称。

应该执行的方式如下

./myscript <filename>

其中<filename> 是不带格式的名称。例如，文件的格式为filename.evnt，所以应该使用

./myscript filename

【问题讨论】：

以上所有内容都可以在一个简短的 awk 脚本中完成。如果您根据您发布的示例输入文件向我们展示您的预期输出，我们可以为您提供帮助。我发布了一个答案，作为您尝试做的第一个猜测 - 看看并告诉我们它是否正确，如果不是，它需要做些什么不同的事情。
@EdMorton：请检查我编辑的问题！
好的，检查我编辑的答案，如果需要做任何不同的事情，请在该答案下方添加评论。

标签： bash sed awk

【解决方案1】：

让我们从这个开始，看看它是否接近你想要做的：

$ numdet=$( awk -v num=4096 '
    NR>2 && NF>1 {
        out = FILENAME "_processed.dat"
        for (i=1;i<=NF;i++) {
            $i = $i-(i-1)*num
        }
        nf = NF
        print > out
    }
    END {
        printf "There are %d detectors\n", nf | "cat>&2"
        print nf
    }
    ' file )

There are 4 detectors

$ cat file_processed.dat
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18

$ echo "$numdet"
4

是吗？

【讨论】：

它完成了工作！让我理解一下……您将numdet 设置为一个可以完成所有这些操作的变量？另外我正在运行它，我得到./myscript: line 14: $: command not found。注意要处理的文件，有没有！
不，awk 脚本完成所有这些，然后打印数字 4，然后将其保存在 shall 变量 numdet 中。你有没有机会在$ numdet=$( awk... 中包括领先的$ ？如果是这样，请不要 - $ 是我的提示。
啊，我在您的脚本中看到您使用旧的`...` 构造来调用命令（例如v3=`wc -l $1.dat`）。该语法已被弃用，现在您应该改用$(...)（例如v3=$(wc -l $1.dat)）。也许这阐明了我的脚本如何执行 awk 命令并将其标准输出保存在 shell 变量 numdet 中，就像您将 wc 的标准输出保存在 shell 变量 v3 中一样。
你是对的！我包括了$...傻我...问题是消息There are ...在输出文件中，而不是在控制台中。
是的，因为您在问题中明确表示您希望这样做：a sample output file is the following...There are 4 detectors。如果这不是您真正想要的，那么只需删除或编辑执行此操作的明显代码行 - printf "There are %d detectors\n",nf > out。如果您不知道如何让脚本执行您想要的操作，请更新您的问题以阐明您真正想要它执行的操作。

【解决方案2】：

使用 awk

awk 'NR<=2{next}{for (i=1;i<=NF;i++) $i=$i-(i-1)*4096}1' file

【讨论】：

非常感谢您的回答！我正在尝试打印字段数。我正在使用END{print "There are %d detectors", NF}，但它给了我There are %d detectors1。所以我需要正确数量的字段和打印方式。
@Thanos - 使用 printf，而不是 print 进行格式化输出。