【问题标题】:How do I split apart a CSV string in Ruby?如何在 Ruby 中拆分 CSV 字符串?
【发布时间】:2011-04-25 09:17:19
【问题描述】:

我以 CSV 文件中的这一行为例:

2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"

我想把它拆分成一个数组。直接的想法是只用逗号分割,但有些字符串中有逗号,例如“Life and Living Processes, Life Processes”,这些应该作为单个元素保留在数组中。另请注意,中间有两个逗号 - 我想将它们作为空字符串。

也就是说,我想要得到的数组是

[2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes","","",1,0,"endofline"]

我可以想到涉及 eval 的 hacky 方式,但我希望有人能想出一个干净的正则表达式来做到这一点......

干杯,最大

【问题讨论】:

  • 这是一个完美的例子,说明从字符串中提取数据所涉及的所有内容并非都是正则表达式的工作。

标签: ruby regex csv split


【解决方案1】:

我的首选是@steenstag 的解决方案,但另一种选择是将String#scan 与以下正则表达式一起使用。

r = /(?<![^,])(?:(?!")[^,\n]*(?<!")|"[^"\n]*")(?![^,])/

如果变量str持有示例中给出的字符串,我们得到:

puts str.scan r

展示

2412
21
"Which of the following is not found in all cells?"
"Curriculum"
"Life and Living Processes, Life Processes"


1
0
"endofline"

Start your engine!

另请参阅regex101,它提供了正则表达式每个标记的详细说明。 (将光标移过正则表达式。)

Ruby 的正则表达式引擎执行以下操作。

(?<![^,]) : negative lookbehind assert current location is not preceded
            by a character other than a comma
(?:       : begin non-capture group
  (?!")   : negative lookahead asserts next char is not a double-quote
  [^,\n]* : match 0+ chars other than a comma and newline
  (?<!")  : negative lookbehind asserts preceding character is not a
            double-quote
  |       : or
  "       : match double-quote
  [^"\n]* : match 0+ chars other than double-quote and newline
  "       : match double-quote
)         : end of non-capture group
(?![^,])  : negative lookahead asserts current location is not followed
            by a character other than a comma

注意(?&lt;![^,])(?&lt;=,|^) 相同,(?![^,])(?=^|,) 相同。

【讨论】:

    【解决方案2】:
    str=<<EOF
    2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
    EOF
    require 'csv' # built in
    
    p CSV.parse(str)
    # That's it! However, empty fields appear as nil.
    # Makes sense to me, but if you insist on empty strings then do something like:
    parser = CSV.new(str)
    parser.convert{|field| field.nil? ? "" : field}
    p parser.readlines
    

    【讨论】:

    • 感谢 Steenslag,这太完美了。碰巧的是,我不介意空白字段为零。干杯,最大
    【解决方案3】:
    text=<<EOF
    2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
    EOF
    x=[]
    text.chomp.split("\042").each_with_index do |y,i|
      i%2==0 ?  x<< y.split(",") : x<<y
    end
    print x.flatten
    

    输出

    $ ruby test.rb
    ["2412", "21", "Which of the following is not found in all cells?", "Curriculum", "Life and Living Processes, Life Processes", "", "", "", "1", "0", "endofline"]
    

    【讨论】:

      【解决方案4】:

      今天早上,我偶然发现了一个用于 Ruby-on-Rails 的 CSV 表导入器项目。最终你会发现代码很有帮助:

      Github TableImporter

      【讨论】:

        【解决方案5】:

        编辑:我未能阅读 Ruby 标签。好消息是,该指南将解释构建它背后的理论,即使语言细节不正确。对不起。

        这里有一个很棒的指南:

        http://knab.ws/blog/index.php?/archives/10-CSV-file-parser-and-writer-in-C-Part-2.html

        csv 编写器在这里:

        http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html

        这些示例涵盖了在 csv 中包含带引号的文字(可能包含也可能不包含逗号)的情况。

        【讨论】:

          【解决方案6】:

          这不是正则表达式的合适任务。您需要一个 CSV 解析器,而 Ruby 内置了一个:

          http://ruby-doc.org/stdlib/libdoc/csv/rdoc/classes/CSV.html

          还有一个可以说是优秀的第三部分库:

          http://fastercsv.rubyforge.org/

          【讨论】:

          • 我认为 CSV 无法处理限定符?
          • FasterCSV 是 Ruby 1.9.x 的默认设置,它允许您指定可能对他有帮助的 quote_char
          • 什么是“限定词”?这是一个股票 CSV 行。无需弄乱quote_chars。
          • 我同意首选使用 CSV 方法,但这并不是说不能使用正则表达式。
          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2011-04-29
          • 1970-01-01
          • 1970-01-01
          • 2010-10-31
          • 1970-01-01
          • 1970-01-01
          • 2015-08-09
          相关资源
          最近更新 更多