【问题标题】:sed/awk - insert space after with pattern matchingsed/awk - 在模式匹配之后插入空格
【发布时间】:2018-04-11 13:40:22
【问题描述】:

我正在尝试使用 sed 在 csv 文件中全局查找和替换,其中每个字段由 " 分隔并由 分隔,但字段的某些内容也可以包含 "。 我正在尝试查找字段中最后一个字符为“的情况,并在其后插入一个空格,以便该字段以空格而不是”结尾。
请注意,一行中可能有多个字段,其中字段的最后一个字符可能是“。

例如,这里是一些文件内容(4 行)...

"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seven"","","",""
"303030","one two "three" "four"","five "six"","",""

它应该变成……

"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seven" ","","",""
"303030","one two "three" "four" ","five "six" ","",""

即插入空格的 3 个位置:第 3 行一次,第 4 行两次。

目前我已经达到了:

1,$ s/[^,]"",/" ",/g

所以它会找到所有出现但不保留匹配前的字符,所以我得到结果......

"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seve" ","","",""
"303030","one two "three" "fou" ","five "si" ","",""

如何使用 sed 获得所需的输出?或者也许用 awk?

谢谢。

【问题讨论】:

  • 根据 CSV 的 RFC 和 Excel(事实上的 CSV 标准),在双引号分隔的字段中包含未转义的双引号是无效的。在双引号内,应通过将双引号加倍 "this is ""one"" way" 或在不太常见的前面加上反斜杠 "This is \"the other\" way" 来转义双引号。修复生成无效 CSV 的任何工具,或者至少在您请求帮助编写的新工具的输出中修复它,而不是在 CSV 中引入另一个非标准的怪异,然后查看 stackoverflow.com/q/45420535/1745001 以了解如何解析使用 awk。
  • 很遗憾,我收到了一位不愿意为我们供应商调整格式的客户提供的文件,所以我必须解决这个问题。
  • 好的,那么您至少应该在您当前正在编写的工具的输出中修复它,以便其他工具可以使用它。

标签: regex awk sed


【解决方案1】:

您需要创建一个捕获组并在替换中使用反向引用:

sed -E 's/([^,"])""/\1" "/g' file

"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seven" ","","",""
"303030","one two "three" "four" ","five "six" ","",""

要在线保存更改,请使用:

sed -i.bak -E 's/([^,"])""/\1" "/g' file

【讨论】:

  • 谢谢,但这给了我“'s' 命令的 RHS 上的无效引用 \1”-我不明白为什么它无法解析捕获组。
  • 您是否按照建议使用了sed -E
  • 我使用了 -e(脚本表达式)——我在 RHEL 上运行。 -E 在我的版本中不是有效参数,
  • 然后试试:sed 's/\([^,"]\)""/\1" "/g' file
猜你喜欢
  • 1970-01-01
  • 2014-09-07
  • 1970-01-01
  • 2012-07-11
  • 1970-01-01
  • 1970-01-01
  • 2013-07-02
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多