使用 awk 和 sed 消除不需要的输出答案

【问题标题】：eliminate unwanted output using awk and sed使用 awk 和 sed 消除不需要的输出
【发布时间】：2012-04-11 20:53:58
【问题描述】：

从以下命令我如何消除之前出现的所有行

 Owner     RepoName             CreatedDate

编辑命令：

find /opt/site/ -name '.log.txt' | xargs cat | awk '{$NF=""; print $0}' | sed '1i Owner RepoName CreatedDate' | column -t

输出是

find: Filesystem loop detected; `/nfs/.snapshot/nightly.4' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.5' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.6' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD

只需要输出：

Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD

【问题讨论】：

标签： linux sed awk command

【解决方案1】：

那些 find 错误将在 stderr 上，因此完全绕过您的链，您需要使用 2>/dev/null 重定向错误，尽管这会阻止您在 find 命令中看到任何其他错误。

find /opt/site/ -name '.log.txt' 2>/dev/null | xargs cat | awk '{$NF=""; print $0}' | xargs sed "/Filesystem/d" | sed '1i Owner RepoName CreatedDate' | column -t

一般来说，对于这样一个复杂的命令，你应该在出现错误时将其分解，以便找出问题的根源。

让我们拆分这个命令，看看它在做什么：

find /opt/site/ -name '.log.txt' 2>/dev/null - 查找 /opt/site/ 下名为 .log.txt 的所有文件

xargs cat - 获取所有内容，一个接一个

awk '{$NF=""; print $0}' - 删除最后一列

xargs sed "/Filesystem/d" - 将每个条目视为一个文件，并从这些文件的内容中删除任何包含 Filesystem 的行。

sed '1i Owner RepoName CreatedDate' - 在第一行插入 Owner RepoName CreatedDate

column -t - 将给定的数据转换成表格

我建议构建命令，并在每个阶段检查输出是否正确。

您的命令有几处令人惊讶：

find 查找完全是 .log.txt 而不是扩展名的文件。
第二个 xargs 调用 - 将 .log.txt 文件的内容转换为文件名。

【讨论】：

@Rajeev 您可能正在使用bash以外的shell？
：我用过bash输出还是一样。不明确的输出重定向

【解决方案2】：

您可以通过在第一个管道之前将 2>/dev/null 附加到命令的 find 部分来消除 find 的错误输出。 [编辑：这是最好的方式，我投票赞成道格拉斯，因为他首先在这里;）]

但如果你真的想用 sed 或 awk 来做（不知道为什么？），你可以修改你的 awk 脚本以跳过以 'find:' 开头的行：

awk '/^find:/ {next;} {$NF=""; print $0}'

【讨论】：

我仍然得到与你建议相同的输出

【解决方案3】：

遗憾的是，您似乎正在使用 csh 或 tcsh，在这些地方很难将标准错误与标准输出分开重定向。否则道格拉斯的回答会奏效。但是试试这个：

(find /opt/site/ -name '.log.txt' | xargs cat | awk '{$NF=""; print $0}' | sed '1i Owner RepoName CreatedDate' | column -t > output) >&/dev/null

注意大部分命令周围的括号。在这些括号中是一个重定向，用于将标准输出发送到一个名为“输出”的文件，而不是发送到您的终端（将其命名为您想要的任何名称——或者如果您真的想在终端中看到它，请将 output 替换为 /dev/tty） .在这些括号之外是一个重定向，将剩余的错误消息发送到/dev/null。

整件事都是对可怕贝壳寿命的悲惨评论。

【讨论】：

【解决方案4】：

下一个sed 命令应该完成这项工作（与输入文件或管道一起使用）：

sed -n '/^Owner/,$ p'

解释：

-n             # Disable auto-print.
/^Owner/       # From a line beginning with 'Owner'...
$              # ...until end of input...
p              # print

【讨论】：

【解决方案5】：

这完全可以通过 Awk 脚本实现...

#!/usr/bin/awk -f

BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] ~ "^--from=") {
      _from = substr(ARGV[i], 8)
      delete ARGV[i]
    }
  }

  if (!_from) {
    print "No '--from' argument provided!" > "/dev/stderr"
  }
}


{

  if (_flag) {
    print $0
  } else if ($0 ~ _from) {
    _flag = 1
    print $0
  }

}

注意；上面的脚本是从from-till.awk 改编（精简）而来的，它将在--from 和--till 搜索表达式之间打印，因此可能需要针对这个特定的用例调整添加的命令行选项和变量名。

...允许使用文件作为输入...

head-trimmer.awk --from="^Owner" file-path.txt

...或重定向，例如EOF 或管道...

head-trimmer.awk --from="^Owner" <<'EOF'
find: Filesystem loop detected; `/nfs/.snapshot/nightly.4' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.5' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.6' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD
EOF

... 并且应该将事物解析为...

Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD

... Awk 脚本可以更轻松地扩展和/或适应其他用例，正确使用它意味着可以消除对其他程序的不必要调用。

应该可以通过更多提示从管道中消除sed 和column

BEGIN 和 END blocks 与 Awk 在 all 输入的开头和结尾运行，例如。文件列表，因此非常适合构建标题和列映射

在 Awk 中使用 while 和 getline 可以解析命令的输出...

#!/usr/bin/awk -f

BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] ~ "^--directory=") {
      _directory = substr(ARGV[i], 13)
      delete ARGV[i]
    }
    if (ARGV[i] ~ "^--name=") {
      _name = substr(ARGV[i], 8)
      delete ARGV[i]
    }
    # ... perhaps add other args to parse
  }

  # ... build/print header maybe

}


{

  cmd = "find " _directory " -name " _name " 2>/dev/null"
  while (( cmd | getline _line ) > 0) {
    print "_line ->", _line
    # ... do some fancy formatting, use a built-in, or another command
    #     to build desired column output from find results
  }
  close(cmd)

  # ...

}

当你想编写一个 Bash 脚本时，这可能非常方便，该脚本只是一个带有一些自定义解析的命令的包装器。

有很多方便的内置 Awk 函数（GAwk 更是如此），例如。 split、length，并且可以通过 Awk 脚本中的 function 关键字添加更多内容。

数组/字典变量也可以使用 Awk，例如...

BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] ~ "^--from=") {
      _custom_args["from"] = substr(ARGV[i], 8)
      delete ARGV[i]
    } else if (ARGV[i] ~ "^--till=") {
      _custom_args["till"] = substr(ARGV[i], 8)
      delete ARGV[i]
    }
  }
}


{
  # ...
}

但是（如果我没记错的话）应该避免像 _something[0,1] 这样的多维数组，因为在 Awk 中这样的事情真的是 _something["0,1"]

使用 Awk 将列打印为格式良好的表格有点棘手，但也可以通过 printf 格式化选项来实现...

#!/usr/bin/awk -f

BEGIN {
  printf("%-8s %-13s %s\n", "Owner", "RepoName", "CreatedDate")
}

本质上，%-8s 告诉 Awk 至少保留 8 个空格字符，而不管 "Owner"、%-13s 保留 13 和 - 的字符串长度如何，告诉 Awk 用字符串右侧/末尾的分隔符。

为了禁止更长的刺痛printf 结合%.<n> 可能有用...

#!/usr/bin/awk -f

BEGIN {
  printf("%.3s %-13s %s\n", "Owner", "RepoName", "CreatedDate")
}

如果您遇到问题，请随时发表评论，我会再次尝试提供更多提示。

【讨论】：