【问题标题】:Average Columns By Header Name按标题名称的平均列
【发布时间】:2016-07-04 07:24:18
【问题描述】:

我有这样的列的文件。下面的示例输入是部分输入。

请查看下面的主文件链接。每个文件只有两行。

Gene    0.4%    0.7%    1.1%    1.4%    1.8%    2.2%    2.5%    2.9%    3.3%    3.6%    4.0%    4.3%    4.7%    5.1%    5.4%    5.8%    6.2%    6.5%    6.9%    7.2%    7.6%    8.0%    8.3%    8.7%    9.1%    9.4%    9.8%    10.1%   10.5%   10.9%   11.2%   11.6%   12.0%   12.3%   12.7%   13.0%   13.4%   13.8%   14.1%   14.5%   14.9%   15.2%   15.6%   15.9%   16.3%   16.7%   17.0%   17.4%   17.8%   18.1%   18.5%   18.8%   19.2%   19.6%   19.9%   20.3%   20.7%   21.0%   21.4%   21.7%   22.1%   22.5%   22.8%   23.2%   23.6%   23.9%   24.3%   24.6%   25.0%   25.4%   25.7%   26.1%   26.4%   26.8%   27.2%   27.5%   27.9%   28.3%   28.6%   29.0%   29.3%   29.7%   30.1%   30.4%   30.8%   31.2%   31.5%   31.9%   32.2%   32.6%   33.0%   33.3%   33.7%   34.1%   34.4%   34.8%   35.1%   35.5%   35.9%   36.2%   36.6%   37.0%   37.3%   37.7%   38.0%   38.4%   38.8%   39.1%   39.5%   39.9%   40.2%   40.6%   40.9%   41.3%   41.7%   42.0%   42.4%   42.8%   43.1%   43.5%   43.8%   44.2%   44.6%   44.9%   45.3%   45.7%   46.0%   46.4%   46.7%   47.1%   47.5%   47.8%   48.2%   48.6%   48.9%   49.3%   49.6%   50.0%   50.4%   50.7%   51.1%   51.4%   51.8%   52.2%   52.5%   52.9%   53.3%   53.6%   54.0%   54.3%   54.7%   55.1%   55.4%   55.8%   56.2%   56.5%   56.9%   57.2%   57.6%   58.0%   58.3%   58.7%   59.1%   59.4%   59.8%   60.1%   60.5%   60.9%   61.2%   61.6%   62.0%   62.3%   62.7%   63.0%   63.4%   63.8%   64.1%   64.5%   64.9%   65.2%   65.6%   65.9%   66.3%   66.7%   67.0%   67.4%   67.8%   68.1%   68.5%   68.8%   69.2%   69.6%   69.9%   70.3%   70.7%   71.0%   71.4%   71.7%   72.1%   72.5%   72.8%   73.2%   73.6%   73.9%   74.3%   74.6%   75.0%   75.4%   75.7%   76.1%   76.4%   76.8%   77.2%   77.5%   77.9%   78.3%   78.6%   79.0%   79.3%   79.7%   80.1%   80.4%   80.8%   81.2%   81.5%   81.9%   82.2%   82.6%   83.0%   83.3%   83.7%   84.1%   84.4%   84.8%   85.1%   85.5%   85.9%   86.2%   86.6%   87.0%   87.3%   87.7%   88.0%   88.4%   88.8%   89.1%   89.5%   89.9%   90.2%   90.6%   90.9%   91.3%   91.7%   92.0%   92.4%   92.8%   93.1%   93.5%   93.8%   94.2%   94.6%   94.9%   95.3%   95.7%   96.0%   96.4%   96.7%   97.1%   97.5%   97.8%   98.2%   98.6%   98.9%   99.3%   99.6%   100.0%  0.4%    0.7%    1.1%    1.4%    1.8%    2.2%    2.5%    2.9%    3.3%    3.6%    4.0%    4.3%    4.7%    5.1%    5.4%    5.8%    6.2%    6.5%    6.9%    7.2%    7.6%    8.0%    8.3%    8.7%    9.1%    9.4%    9.8%    10.1%   10.5%   10.9%   11.2%   11.6%   12.0%   12.3%   12.7%   13.0%   13.4%   13.8%   14.1%   14.5%   14.9%   15.2%   15.6%   15.9%   16.3%   16.7%   17.0%   17.4%   17.8%   18.1%   18.5%   18.8%   19.2%   19.6%   19.9%   20.3%   20.7%   21.0%   21.4%   21.7%   22.1%   22.5%   22.8%   23.2%   23.6%   23.9%   24.3%   24.6%   25.0%   25.4%   25.7%   26.1%   26.4%   26.8%   27.2%   27.5%   27.9%   28.3%   28.6%   29.0%   29.3%   29.7%   30.1%   30.4%   30.8%   31.2%   31.5%   31.9%   32.2%   32.6%   33.0%   33.3%   33.7%   34.1%   34.4%   34.8%   35.1%   35.5%   35.9%   36.2%   36.6%   37.0%   37.3%   37.7%   38.0%   38.4%   38.8%   39.1%   39.5%   39.9%   40.2%   40.6%   40.9%   41.3%   41.7%   42.0%   42.4%   42.8%   43.1%   43.5%   43.8%   44.2%   44.6%   44.9%   45.3%   45.7%   46.0%   46.4%   46.7%   47.1%   47.5%   47.8%   48.2%   48.6%   48.9%   49.3%   49.6%   50.0%   50.4%   50.7%   51.1%   51.4%   51.8%   52.2%   52.5%   52.9%   53.3%   53.6%   54.0%   54.3%   54.7%   55.1%   55.4%   55.8%   56.2%   56.5%   56.9%   57.2%   57.6%   58.0%   58.3%   58.7%   59.1%   59.4%   59.8%   60.1%   60.5%   60.9%   61.2%   61.6%   62.0%   62.3%   62.7%   63.0%   63.4%   63.8%   64.1%   64.5%   64.9%   65.2%   65.6%   65.9%   66.3%   66.7%   67.0%   67.4%   67.8%   68.1%   68.5%   68.8%   69.2%   69.6%   69.9%   70.3%   70.7%   71.0%   71.4%   71.7%   72.1%   72.5%   72.8%   73.2%   73.6%   73.9%   74.3%   74.6%   75.0%   75.4%   75.7%   76.1%   76.4%   76.8%   77.2%   77.5%   77.9%   78.3%   78.6%   79.0%   79.3%   79.7%   80.1%   80.4%   80.8%   81.2%   81.5%   81.9%   82.2%   82.6%   83.0%   83.3%   83.7%   84.1%   84.4%   84.8%   85.1%   85.5%   85.9%   86.2%   86.6%   87.0%   87.3%   87.7%   88.0%   88.4%   88.8%   89.1%   89.5%   89.9%   90.2%   90.6%   90.9%   91.3%   91.7%   92.0%   92.4%   92.8%   93.1%   93.5%   93.8%   94.2%   94.6%   94.9%   95.3%   95.7%   96.0%   96.4%   96.7%   97.1%   97.5%   97.8%   98.2%   98.6%   98.9%   99.3%   99.6%   100.0%

基本上,这是我需要做的。

一个。从第二列开始,这里是 0.4%。

b.一直到你在标题名称中点击“10”。如果标题名称正好是 10.0%,那么也包括该列。如果没有,只包括直到它之前的列。在此示例中,由于我们有 10.1%(第 29 列),我们将包括从 0.4%(秒)到第 28 列的 9.8% 的列。如果第 29 列是 10.0%,那么它也会被包括在内。

c。平均第二行中这些相应列的值(此处未显示数据 - 请单击此链接查看总数据集 - https://goo.gl/W8jND7)。在本例中,从 0.4%(第二列)到 9.8%(第 28 列)。

d。在输出中,打印第一列是“基因”,这个平均值是列标题为

Gene Average_10%

e。然后从 10.1%(第 29 列)开始检查,直到您在标题名称中点击“20”。重复步骤 b 到 d。并将输出打印为

Gene Average_10% Average_20%

重复这个直到你有

Gene Average_10% Average_20% Average_30% Average_40% Average_50% Average_60% Average_70% Average_80% Average_90% Average_100%

f。当你达到 100% 后,就意味着一个数据集完成了。

g.如果您在这里仔细观察我的列标题,在第一个 100% 之后还有另外 0.4%-100% 的列。我将在上述链接的输入文件中包含 13 个这些 0.4%-100%s。

我。我有多个文件,标题可以是

1% 2% 3%....100%
1.5% 2.5% 3.5%....100%

它因文件而异。但是平均的逻辑(如果你点击“10”、“20”等)总是一样的。并且样本数 13 也是相同的,这意味着每个文件将有 13 次 100%s。

【问题讨论】:

    标签: linux awk average multiple-columns


    【解决方案1】:

    我应该说,对于这项任务来说,这是一种可怕的格式。我不希望任何人为您提出最终解决方案,但这就是我的处理方式

    awk 'NR == 1 {
        gsub("%","");
        for (f=2; f<=NF; f++) {
          for (i=1; i<10; i++) 
              if ($f<10*i && $(f+1)>=10*i) print f, $f
          if ($f==100) print f, $f   
        }}' file
    
    28 9.8
    56 19.9
    83 29.7
    111 39.9
    138 49.6
    166 59.8
    194 69.9
    221 79.7
    249 89.9
    277 100.0
    304 9.8
    332 19.9
    359 29.7
    387 39.9
    414 49.6
    442 59.8
    470 69.9
    497 79.7
    525 89.9
    553 100.0
    

    此处打印列索引和用于验证目的的阈值。提取列边界后,对各个列求和应该很简单。请注意,按照您的逻辑,永远不应该包括 100%,但是这似乎是错误的,所以我有特殊情况。

    【讨论】:

    • 谢谢@karakfa
    猜你喜欢
    • 1970-01-01
    • 2016-12-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-10
    • 2021-02-13
    • 1970-01-01
    • 2019-08-13
    相关资源
    最近更新 更多