【问题标题】:Parsing with ruby用红宝石解析
【发布时间】:2015-04-16 04:00:29
【问题描述】:

我是 ruby​​ 新手,我有一个学校项目,我正在解析一个 xml 文件并且需要在某些标签之后获取数据。我只能使用核心红宝石。没有宝石

    pFile = File.open("myfile.mzML", "r")
    regmsLvl = "ms level\" value=\""

    pFile.each_line { |line|

    scn = line.scan(/#{regmsLvl}(\d)/)
    #what I want to do but doesn't work


    if scn == 1
        puts("Got it!")
    end
    #what I have to do to compare if == 1
    if scn != nil
        scn.each do |val|

    if val[0].to_i == 1
        puts("Got it!")

    end
    end
    end

    }
    # a sample line that I am parsing is: 
    <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" /> 

这似乎很愚蠢。 line.scans 输出使 scn 成为二维数组。我怎样才能让它成为一个每次通过都会被覆盖的字符串。或者我应该如何改变这整个事情。任何建议表示赞赏。 puts(scn) 打印出 1 但如果我执行 scn == 1 或 scn.to_i == 1 它永远不会进入 if。我试过 scn.pop 和 scn.pop.pop

我添加了一个部分来展示我现在想要做什么。

我需要检查 ms 级别:如果为 1,则获取扫描开始时间,然后获取二进制文件。这是我现在正在使用的代码。

xmlfile = File.new("afile.mzML")
xmldoc = Document.new(xmlfile)


root = xmldoc.root
puts "Root element : " + root.attributes["xmlns"]


 xmldoc.elements.each("mzML/run/spectrumList/spectrum/cvParam"){
|e| if e.attributes["value"].to_i ==1
 # Now I need to get start time: @  
    ["mzML/run/spectrumList/spectrum/cvParam/scanList/scan/value"]
 # and then
    ["mzML/run/spectrumList/spectrum/cvParam/binaryDataArrayList/binaryDataArray/binary"]

end

}

<run id="ru_0" defaultInstrumentConfigurationRef="ic_0" sampleRef="sa_0" defaultSourceFileRef="sf_ru_0">
    <spectrumList count="3310" defaultDataProcessingRef="dp_sp_0">
        <spectrum id="scan=8839" index="0" defaultArrayLength="171" dataProcessingRef="dp_sp_0">
            <cvParam cvRef="MS" accession="MS:1000525" name="spectrum representation" />
            <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" />
            <cvParam cvRef="MS" accession="MS:1000294" name="mass spectrum" />
            <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" />
            <scanList count="1">
                <cvParam cvRef="MS" accession="MS:1000795" name="no combination" />
                <scan>
                    <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="5429.47" unitAccession="UO:0000010" unitName="second" unitCvRef="UO" />
                </scan>
            </scanList>
            <binaryDataArrayList count="2">
                <binaryDataArray encodedLength="1824">
                    <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" unitAccession="MS:1000040" unitName="m/z" unitCvRef="MS" />
                    <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" />
                    <cvParam cvRef="MS" accession="MS:1000576" name="no compression" />
                    <binary>AAAAQBCdgkAAAACAP6KCQAAAAAA8pIJAAAAAYAWlgkAAAABgQ6aCQAAAAGCzp4JAAAAAQEaogkAAAACgDKqCQAAAAEAgqoJAAAAAwEOqgkAAAABAWKqCQAAAAGBErIJAAAAAIOetgkAAAABAMLCCQAAAAGDlsYJAAAAA4DeygkAAAACAw7SCQAAAACBauIJAAAAAwFC6gkAAAACAYb6CQAAAAIDnwYJAAAAAwDjHgkAAAAAATMyCQAAAAADnzIJAAAAAAArOgkAAAACgTc6CQAAAAKBqzoJAAAAAQJLPgkAAAACAVNCCQAAAAAAK0oJAAAAAIF7SgkAAAADABNSCQAAAAKAx1YJAAAAAYHXXgkAAAAAg3teCQAAAAOAf2oJAAAAAICbcgkAAAAAAx92CQAAAAKA03oJAAAAAIBXigkAAAABAO+KCQAAAAKCr5YJAAAAAYMnlgkAAAADgK+aCQAAAAKDq6YJAAAAAAC3qgkAAAACgNe6CQAAAAMCA74JAAAAAANL0gkAAAAAAUfiCQAAAAOCt+YJAAAAA4O75gkAAAACAPPqCQAAAAGBq/oJAAAAAwEQCg0AAAABAKAqDQAAAAAAoDoNAAAAA4G0Og0AAAADAZhKDQAAAACCBEoNAAAAAwIQWg0AAAABAjheDQAAAAMA+GoNAAAAAQIYag0AAAAAA7RyDQAAAAEB9HYNAAAAAwIseg0AAAADgbyKDQAAAAAAPJINAAAAAgEUlg0AAAACgYCaDQAAAAOBfKoNAAAAA4DAug0AAAADAZi+DQAAAAAA0MINAAAAAoFMwg0AAAAAgMjKDQAAAACA2NINAAAAAgDk2g0AAAAAg+DyDQAAAAOAfPoNAAAAAAKU/g0AAAAAgQUKDQAAAAKBVQoNAAAAAYNRHg0AAAAAgf0qDQAAAAICZSoNAAAAAIDFQg0AAAAAgM1KDQAAAAEBjUoNAAAAAoGNUg0AAAAAAZ1aDQAAAAABqWINAAAAAYHhZg0AAAACAfl2DQAAAAEAcXoNAAAAAICpfg0AAAADgw2GDQAAAAACmZ4NAAAAAQDRog0AAAABAiWqDQAAAAAAibYNAAAAAQHpug0AAAABAEnKDQAAAAABCcoNAAAAAoHxyg0AAAACgGXaDQAAAAMBDdoNAAAAAgJR2g0AAAAAgHHqDQAAAAEBGeoNAAAAAIHh6g0AAAABAl3qDQAAAAKCkfYNAAAAAYE5+g0AAAAAAm36DQAAAAEDigYNAAAAAQGWCg0AAAABAjYKDQAAAACClgoNAAAAA4ESGg0AAAABgYIaDQAAAAMDSh4NAAAAAYCqIg0AAAADAT4qDQAAAAACCioNAAAAAwJmOg0AAAABAnZKDQAAAAKDJlINAAAAAgHGWg0AAAABgl5eDQAAAAEB4mINAAAAA4B2eg0AAAADgKKCDQAAAAGAvooNAAAAAwJakg0AAAABAUaiDQAAAAGBgqoNAAAAAIBatg0AAAADAxa6DQAAAAKCosoNAAAAAICy6g0AAAAAAbrqDQAAAAACRuoNAAAAAAMa/g0AAAACgOsCDQAAAAABzwoNAAAAAIOTCg0AAAACADcWDQAAAAGB4xoNAAAAAQOfGg0AAAAAAvceDQAAAAEBZyoNAAAAA4OnKg0AAAAAgMs6DQAAAAOC/z4NAAAAAYInUg0AAAABgftaDQAAAAODC1oNAAAAAwJXXg0AAAAAAgdiDQAAAAKA/2oNAAAAAoILag0AAAABghtyDQAAAAGCm3INAAAAAAO7cg0AAAACgr9+DQAAAAGCY4oNAAAAAgDbkg0AAAABAN+WDQAAAAKBU5oNA</binary>
                </binaryDataArray>

【问题讨论】:

    标签: ruby parsing xml-parsing


    【解决方案1】:

    我认为你很接近。假设您可以使用该 REXML 库(它看起来像是核心 ruby​​ 库的一部分),您应该能够做到这一点

    require 'rexml/document'
    
    xmlfile = File.new("afile.mzML")
    xmldoc = REXML::Document.new(xmlfile)
    root = xmldoc.root
    
    start_time = nil
    binary = nil
    # get the ms level
    ms_level = root.elements["spectrumList/spectrum/cvParam[@name='ms level']"].attributes["value"].to_i
    
    if ms_level == 1
      # get the scan start time
      start_time = root.elements["spectrumList/spectrum/scanList/scan/cvParam[@name='scan start time']"].attributes["value"]
      # get the binary
      binary = root.elements["spectrumList/spectrum/binaryDataArrayList/binaryDataArray/binary"].text
    end
    
    p start_time # => "5429.47"
    p binary # => that crazy long binary
    

    这个 REXML 教程很有帮助:http://www.germane-software.com/software/rexml/docs/tutorial.html

    注意,我做了一些假设,比如元素总是存在的,ms 级别总是一个 int,文件结构总是一样的。这些假设在您的情况下可能不正确,但这应该是一个开始。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-08-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多