【发布时间】:2018-04-11 13:40:22
【问题描述】:
我正在尝试使用 sed 在 csv 文件中全局查找和替换,其中每个字段由 " 分隔并由 分隔,但字段的某些内容也可以包含 "。
我正在尝试查找字段中最后一个字符为“的情况,并在其后插入一个空格,以便该字段以空格而不是”结尾。
请注意,一行中可能有多个字段,其中字段的最后一个字符可能是“。
例如,这里是一些文件内容(4 行)...
"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seven"","","",""
"303030","one two "three" "four"","five "six"","",""
它应该变成……
"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seven" ","","",""
"303030","one two "three" "four" ","five "six" ","",""
即插入空格的 3 个位置:第 3 行一次,第 4 行两次。
目前我已经达到了:
1,$ s/[^,]"",/" ",/g
所以它会找到所有出现但不保留匹配前的字符,所以我得到结果......
"123","def","","",""
"456","seven eight "nine" ten","","",""
"789"."twenty thirty sixty "seve" ","","",""
"303030","one two "three" "fou" ","five "si" ","",""
如何使用 sed 获得所需的输出?或者也许用 awk?
谢谢。
【问题讨论】:
-
根据 CSV 的 RFC 和 Excel(事实上的 CSV 标准),在双引号分隔的字段中包含未转义的双引号是无效的。在双引号内,应通过将双引号加倍
"this is ""one"" way"或在不太常见的前面加上反斜杠"This is \"the other\" way"来转义双引号。修复生成无效 CSV 的任何工具,或者至少在您请求帮助编写的新工具的输出中修复它,而不是在 CSV 中引入另一个非标准的怪异,然后查看 stackoverflow.com/q/45420535/1745001 以了解如何解析使用 awk。 -
很遗憾,我收到了一位不愿意为我们供应商调整格式的客户提供的文件,所以我必须解决这个问题。
-
好的,那么您至少应该在您当前正在编写的工具的输出中修复它,以便其他工具可以使用它。