【问题标题】:Display Nokogiri children nodes as raw HTML instead of &gt;tag&lt;将 Nokogiri 子节点显示为原始 HTML 而不是 >tag<
【发布时间】:2015-10-31 07:58:32
【问题描述】:

我正在将 XML 表更改为 HTML 表,并且必须重新排列节点。

为了完成转换,我抓取 XML,将其放入二维数组,然后构建新的 HTML 以输出。

但是有些单元格中有HTML标签,我转换后&lt;su&gt;变成&amp;gt;su&amp;lt;

XML 数据为:

<BOXHD>
  <CHED H="1">Disc diameter, inches (cm)</CHED>
  <CHED H="1">One-half or more of disc covered</CHED>
  <CHED H="2">Number <SU>1</SU>
  </CHED>
  <CHED H="2">Exhaust foot <SU>3</SU>/min.</CHED>
  <CHED H="1">Disc not covered</CHED>
  <CHED H="2">Number <SU>1</SU>
  </CHED>
  <CHED H="2">Exhaust foot<SU>3</SU>/min.</CHED>
</BOXHD>

我将其转换为 HTML 表格的步骤是:

class TableCell

  attr_accessor :text, :rowspan, :colspan

  def initialize(text='')
      @text = text
      @rowspan = 1
      @colspan = 1
  end    
end
@frag = Nokogiri::HTML(xml)

# make a 2d array to store how the cells should be arranged
column = 0
prev_row = -1
@frag.xpath("boxhd/ched").each do |ched|
  row = ched.xpath("@h").first.value.to_i - 1
  if row <= prev_row
    column +=1
  end
  prev_row = row
  @data[row][column] = TableCell.new(ched.inner_html)
end  

# methods to find colspan and rowspan, put them in @data
# ... snip ...

# now build an html table
doc = Nokogiri::HTML::DocumentFragment.parse ""
Nokogiri::HTML::Builder.with(doc) do |html|
  html.table {
    @data.each do |tr|
      html.tr {
        tr.each do |th|
          next if th.nil?
          html.th(:rowspan => th.rowspan, :colspan => th.colspan).table_header th.text
        end
      }
    end
  }
end

这给出了以下 HTML(注意上标被转义):

<table>
    <tr>
        <th rowspan="2" colspan="1" class="table_header">Disc diameter, inches (cm)</th>
        <th rowspan="1" colspan="2" class="table_header">One-half or more of disc covered</th>
        <th rowspan="1" colspan="2" class="table_header">Disc not covered</th>
    </tr>
    <tr>
        <th rowspan="1" colspan="1" class="table_header">Number &lt;su&gt;1&lt;/su&gt; </th>
        <th rowspan="1" colspan="1" class="table_header">Exhaust foot &lt;su&gt;3&lt;/su&gt;/min.</th>
        <th rowspan="1" colspan="1" class="table_header">Number &lt;su&gt;1&lt;/su&gt;</th>
        <th rowspan="1" colspan="1" class="table_header">Exhaust foot&lt;su&gt;3&lt;/su&gt;/min.</th>
    </tr>
</table>

如何获取原始 HTML 而不是实体?

我试过这些都没有成功

@data[row][column] = TableCell.new(ched.children)
@data[row][column] = TableCell.new(ched.children.to_s)
@data[row][column] = TableCell.new(ched.to_s)

【问题讨论】:

    标签: ruby nokogiri html-entities


    【解决方案1】:

    这可能有助于您了解正在发生的事情:

    require 'nokogiri'
    
    doc = Nokogiri::XML('<root><foo></foo></root>')
    
    doc.at('foo').content = '<html><body>bar</body></html>'
    doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n  <foo>&lt;html&gt;&lt;body&gt;bar&lt;/body&gt;&lt;/html&gt;</foo>\n</root>\n"
    
    doc.at('foo').children = '<html><body>bar</body></html>'
    doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n  <foo>\n    <html>\n      <body>bar</body>\n    </html>\n  </foo>\n</root>\n"
    
    doc.at('foo').children = Nokogiri::XML::Document.new.create_cdata '<html><body>bar</body></html>'
    doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n  <foo><![CDATA[<html><body>bar</body></html>]]></foo>\n</root>\n"
    

    【讨论】:

      【解决方案2】:

      我放弃了构建器,只构建了 HTML:

      headers = html_headers()
      
      def html_headers()
      
        rows = Array.new
        @data.each do |row|
            cells = Array.new
            row.each do |cell|
                next if cell.nil?
                cells << "<th rowspan=\"%d\" colspan=\"%d\">%s</th>" %
                            [cell.rowspan,
                            cell.colspan,
                            cell.text]
            end
            rows << "<tr>%s</tr>" % cells.join
        end
        rows.join 
      
      end
      
      def replace_nodes(headers)
      
        # ... snip ...
      
        @frag.xpath("boxhd").each do |old|
            puts "replacing boxhd..."
            old.replace headers
        end
      
        # ... snip ...
      
      end
      

      我不明白为什么,但我用 &lt;BOXHD&gt; 替换标签的文本似乎已解析和可搜索,因为我能够从 cell.text 中的数据更改标签名称。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2015-11-19
        • 1970-01-01
        • 2012-10-25
        • 2013-08-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多