【发布时间】:2016-08-15 06:27:39
【问题描述】:
我有这个 html 内容:
<p>This is a paragraph:</p>
<ul>
<li>
<p>point 1</p>
</li>
<li>
<p>point 2</p>
<ul>
<li>
<p>point 3</p>
</li>
<li>
<p>point 4</p>
</li>
</ul>
</li>
<li>
<p>point 5</p>
</li>
</ul>
<ul>
<li>
<p><strong>sub-head : </strong>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</p>
</li>
<li>
<p><strong>sub-head 2: </strong></p>
<p>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</p>
</li>
</ul>
我想删除
&
标签,无论其在到目前为止,这是我的控制器代码:
nogo={"<li>\n<p>" =>'<li>', "</p>\n</li>" => '</li>', "<td>\n<p>" => '<td>', "</p>\n</td>" => '</td>',
'<p> </p>' => '','<ul>' => "\n<ul>",'</ul>' => "</ul>\n", '</ol>' => "</ol>\n" ,
'<table>' => "\n<table width='100%' border='0' cellspacing='0' cellpadding='0' class='table table-curved'>",
'<' => '<', '>'=>'>','<br>' => '','<p></p>' => '', ' rel="nofollow"' => ''
c=params[:content]
bundle_out=Sanitize.fragment(c,Sanitize::Config.merge(Sanitize::Config::BASIC,
:elements=> Sanitize::Config::BASIC[:elements]+['table', 'tbody', 'tr', 'td', 'h1', 'h2', 'h3'],
:attributes=>{'a' => ['href']}) )#.split(" ").join(" ")
re = Regexp.new(nogo.keys.map { |x| Regexp.escape(x) }.join('|'))
@bundle_out=bundle_out.gsub(re, nogo)
我通过 params[:content] 将上述 html 内容传递给此代码,该参数已分配给变量 c。
以下是不符合预期的o/p。一些关闭 p 标记和打开 p 标记仍在 li 和关闭 li 标记之间
<p>This is a paragraph:</p>
<ul>
<li>point 1</li>
<li>point 2</p>
<ul>
<li>point 3</li>
<li>point 4</li>
</ul>
</li>
<li>point 5</li>
</ul>
<ul>
<li><strong>sub-head : </strong>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</li>
<li><strong>sub-head 2: </strong></p>
<p>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</li>
</ul>
我的目标很简单,我只想删除 li 和 td 标签内的所有 p 标签,我无法正确执行。任何帮助表示赞赏。
我想使用正则表达式来做到这一点。我知道使用正则表达式不是解析 html 内容的正确方法。
【问题讨论】:
-
使用解析器,而不是 HTML。
-
我建议你使用 Nokogiri gem。
-
如果您知道这不是正确的方法,为什么要这样做?我并不是说作为冒犯,我要求澄清 - 除非您非常有说服力地认为解析器不是正确的解决方案,否则这可能是您得到的唯一答案
-
你读过著名的正则表达式无害解析帖子吗?
-
如果您知道不推荐使用正则表达式来执行此操作,那么为什么要问呢?见stackoverflow.com/q/1732348/128421。询问我们如何去做就成了浪费精力,因为无论我们或您做了多少工作,正则表达式最终都无法完成您想要的工作。这不是很好地利用时间或精力。另外,请阅读codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question 和catb.org/esr/faqs/smart-questions.html。他们将帮助您提高提问的能力。使用正确的语法,并努力提出问题是有回报的。
标签: ruby regex ruby-on-rails-4