将 3 列的文件转换为矩阵答案

【问题标题】：Turn file of 3 columns into a matrix将 3 列的文件转换为矩阵
【发布时间】：2017-05-15 16:35:17
【问题描述】：

我有一个文件，其中的信息分为 3 列。第一列表示将填充矩阵顶行的类别，第二列表示将在矩阵第一列中的类别。第三行表示将填充矩阵大部分的值。原始文件的第 1 列和第 2 列可以颠倒，没有区别。

文件如下所示

Category1   type1   +
Category1   type2   -
Category1   type3   +
Category2   type1   +
Category2   type2   +
Category2   type3   +
Category3   type1   +
Category3   type2   -
Category3   type3   -

我想把它变成一个像这样的矩阵

    Category1   Category2   Category3
type1   +   +   +
type2   -   +   -
type3   +   +   -

我在想 awk 可能会这样做，我只是不知道如何让 awk 这样做

【问题讨论】：

关于输入数据：列是制表符分隔还是空格分隔？应该如何看待输出？
@Scheff 一切都是制表符分隔
啊哈。我会尽快发送解决方案。（它目前适用于输入分隔的空格和输出分隔的制表符。）

标签： bash shell matrix awk

【解决方案1】：

awk 来救援！

awk 'BEGIN {FS=OFS="\t"} 
           {col[$1]; row[$2]; val[$2,$1]=$3}
     END   {for(c in col) printf "%s", OFS c; print "";
            for(r in row)
              {printf "%s", r;
               for(c in col) printf "%s", OFS val[r,c]
               print ""}}' file

         Category1       Category2       Category3
 type1   +       +       +
 type2   -       +       -
 type3   +       +       -

【讨论】：

我认为现在最好将其放入脚本中，而不是“单行”。

【解决方案2】：

这是一个基于 GNU awk 的解决方案。我强调这一点是因为多维数组（为了方便的解决方案而获得）是 GNU awk 特有的特性。

我的脚本table2matrix.awk：

# collect values
{
  # category=$1 ; type=$2 ; value=$3
  if (!($1 in categories)) { categories[$1] }
  types[$2][$1] = $3
}
# output of values
END {
  # print col. header
  for (category in categories) { printf("\t%s", category); }
  print ""
  # print rows
  for (type in types) {
    printf("%s", type);
    for (category in categories) {
      printf("\t%s", types[type][category]);
    }
    print ""
  }
}

示例会话：

$ cat >table.txt <<EOF
> Category1   type1   +
> Category1   type2   -
> Category1   type3   +
> Category2   type1   +
> Category2   type2   +
> Category2   type3   +
> Category3   type1   +
> Category3   type2   -
> Category3   type3   -
> EOF

$ awk -f table2matrix.awk table.txt
        Category1       Category2       Category3
type1   +       +       +
type2   -       +       -
type3   +       +       -

$ cat table.txt | sed $'s/   /\t/g' >table-tabs.txt

$ awk -f table2matrix.awk table-tabs.txt 
        Category1       Category2       Category3
type1   +       +       +
type2   -       +       -
type3   +       +       -

$ cat >table-sorted.txt <<EOF
> Category1   type1   +
> Category1   type3   +
> Category2   type1   +
> Category2   type2   +
> Category2   type3   +
> Category3   type1   +
> Category1   type2   -
> Category3   type2   -
> Category3   type3   -
> EOF

$ awk -f table2matrix.awk table-sorted.txt 
        Category1       Category2       Category3
type1   +       +       +
type2   -       +       -
type3   +       +       -

$ tac table.txt >table-reverse.txt

$ awk -f table2matrix.awk table-reverse.txt 
        Category1       Category2       Category3
type1   +       +       +
type2   -       +       -
type3   +       +       -

$ grep '+' table.txt >table-incompl.txt

$ awk -f table2matrix.awk table-incompl.txt 
        Category1       Category2       Category3
type1   +       +       +
type2           +
type3   +       +

$

table.txt 是空格分隔的（从 Web 浏览器复制/粘贴），table-tabs.txt 是 table.txt，空格序列由制表符替换。

从脚本中可以明显看出（但不是从 Web 浏览器中的代码示例），输出是制表符分隔的。

在测试了原始示例输入的一些变体后，我修复了我的 awk 脚本。它变得更短一点，更类似于karafka的其他解决方案...

【讨论】：