【问题标题】:TCL Regex to match unescaped quotes in CSVTCL 正则表达式匹配 CSV 中未转义的引号
【发布时间】:2020-02-19 02:21:53
【问题描述】:

我已经在这里完成了几次问答,以使用 CSV 解决这个问题,我有。

这是一个 CSV 示例,在 5"- 处带有非转义引号 -

"522","-1","12345678","12345678","Completed","","","","height 5' 5" and weight 170lbs","","","9876543","ABCD","2016-06-12T23:54:00-05:00","2016-06-12T23:59:00-05:00"

我已经从这里使用了这个建议:Match unescaped quotes in quoted csv(?<!^|",)"(?!,|$) 正则表达式效果很好,但是在 PHP 上下文中。

我的目标是匹配在双引号 (") 之间但在 TCL 上下文中引用的单个双引号 (")。有什么建议?非常感谢!

【问题讨论】:

标签: regex csv tcl quotes


【解决方案1】:

您可能想要拆分,因此您还可以对 csv 行执行额外检查。在default csv package 的帮助下,我会建议如下内容:

package require csv

set input {"522","-1","12345678","12345678","Completed","","","","height 5' 5" and weight 170lbs","","","9876543","ABCD","2016-06-12T23:54:00-05:00","2016-06-12T23:59:00-05:00"}

# Quick check the line, if it's not complete...
if {![::csv::iscomplete $input]}    
    # Replace all 'quote comma quote' with null character (or some other character you are certain is 
    # not in your csv line
    set intermediate [regsub -all {","} $input \0]

    # Split on null character (or the character you picked on the previous line)
    set columns [split $intermediate \0]

    # Ensure that the csv line contains the expected number of columns
    if {[llength $columns] == 15} {
        # Replace the quotes in each element of the list of columns
        set columns [lmap x $columns {string map {{"} {}} $x}]
    } else {
        # Do further checks otherwise to see what's wrong
    }
    puts "Split data:"
    puts $columns        
} else {
    set columns [::csv::split -alternate $input]
}
set output [::csv::join $columns "," always]
puts "\nRejoined (if you still need it):"
puts $output

输出:

Split data:
522 -1 12345678 12345678 Completed {} {} {} {height 5' 5 and weight 170lbs} {} {} 9876543 ABCD 2016-06-12T23:54:00-05:00 2016-06-12T23:59:00-05:00

Rejoined (if you still need it):
"522","-1","12345678","12345678","Completed","","","","height 5' 5 and weight 170lbs","","","9876543","ABCD","2016-06-12T23:54:00-05:00","2016-06-12T23:59:00-05:00"

这样您就可以更好地控制正在发生的事情,并可能发现 csv 的任何其他问题。

【讨论】:

    【解决方案2】:

    由于您链接到的问题是关于 删除 CSV 字段中的“野生”双引号,因此您需要做的就是将后视修改为带有否定括号表达式的捕获组:

    set a {"522","-1","12345678","12345678","Completed","","","","height 5' 5" and weight 170lbs","","","9876543","ABCD","2016-06-12T23:54:00-05:00","2016-06-12T23:59:00-05:00"}
    set result [regsub -all {([^,])"(?=[^,])} $a "\\1"]
    puts $result
    

    online Tcl demo,输出:

    "522","-1","12345678","12345678","Completed","","","","height 5' 5 and weight 170lbs","","","9876543","ABCD","2016-06-12T23:54:00-05:00","2016-06-12T23:59:00-05:00"
    

    ([^,])"(?=[^,]) 正则表达式匹配

    • ([^,]) - 第 1 组(稍后用 \1 提及):除 , 之外的任何字符
    • " - 双引号
    • (?=[^,]) - 正向前瞻,需要在当前位置右侧立即显示除逗号以外的字符。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-09-25
      • 1970-01-01
      • 1970-01-01
      • 2010-10-16
      • 1970-01-01
      • 1970-01-01
      • 2021-09-02
      相关资源
      最近更新 更多