【问题标题】:Regexp replacement character正则表达式替换字符
【发布时间】:2021-08-08 13:29:04
【问题描述】:

我在 Go 中创建了一个 CSV 文件,我必须在每一列中添加引号(“),我添加了这些,但是这一次,CSV 编程在 comment 中添加了额外的(双)引号 em> 列(如果列中有逗号(,))

我的 CSV

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""

我需要这样的CSV(评论栏中没有双引号)

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

我的 Golang 代码

RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
    fmt.Println("error: ", err)
}

输出

"son likes this video, good job" //(Missing My)
"don't like this video, it may be better" //(Missing I)

【问题讨论】:

  • 您当前和预期的 CSV 文件均无效; oeader 行中的第一列缺少开头的双引号。
  • @Wiktor Stribiże,不,这也替换了 "Scarlett,"","" => "Scarlett,"," 和 "乔迪","","" => "乔迪",", "
  • @Axifive 我明白了,空字段会受到影响。所以,剩下的唯一问题就是评论栏里面有没有"",怎么处理。
  • 您的 CSV 在 ,"Scarlett,"" 上也有问题,它应该在 Scarlett 之后包含一个引号。您是如何生成该 CSV 的?真的很糟糕;如果可能,请重新生成您的 csv,而不是尝试解决此问题。除非你只是在练习。但是我不应该帮助你太多。我只是建议远离正则表达式。

标签: regex string csv go


【解决方案1】:

您可以在捕获外引号之间的所有内容时匹配最后一列,并在 ReplaceAllString 的替换参数中使用反向引用来恢复该部分:

package main

import (
    "fmt"
    "regexp"
)

func main() {
        CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
`   
    RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
    result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
    fmt.Println(result)
}

Go demo,输出:

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

请参阅regex demo详情

  • (?m) - 开启多行模式,$ 将匹配行尾
  • ," - 逗号和"
  • ("[^"]*(?:""[^"]*)*") - 第 1 组 ($1):",然后是除 " 之外的任何零个或多个字符,然后是零个或多个 "" 序列(如果注释列中有转义引号,它们将保持不变),然后是零个或多个非" 字符,然后
  • "$ - 一行末尾的 "

【讨论】:

  • 它在终端上工作,但如果我在 CSV 文件上运行它,它就不起作用。感谢您的努力
  • @Melisa 这意味着您的内容与您在问题中发布的内容不同。可能行尾有尾随空格,regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"\s*$`) 会起作用,但如果没有确切的文件,则无法保证。
【解决方案2】:

您可以使用 ReplaceAllStringFunc() 获得描述的行为

f := func(s string) string {
   return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)

https://play.golang.org/p/1NqTyN1hs1J

还有ReplaceAllString()的替代方案:

RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)

https://play.golang.org/p/tY8zGWTbLLB

【讨论】:

  • 替代方法(replaceAllString)直接起作用!
猜你喜欢
  • 2018-07-13
  • 2015-11-30
  • 2017-02-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-04-17
相关资源
最近更新 更多