【问题标题】:Match Pattern and Replace in HTML tagsHTML标签中的匹配模式和替换
【发布时间】:2023-04-11 04:21:01
【问题描述】:
    </tr>
<tr class='htmllist_tr' style="background-color:yellow" ><td class='htmllist_td' >INDX01</td>
<td class='htmllist_td_nbr' >964.87</td>
<td class='htmllist_td_nbr' >95.13</td>
<td class='htmllist_td' >NehaA9.86</td>
</tr>
<tr class='htmllist_tr' ><td class='htmllist_td' >UNDOTBS1</td>
<td class='htmllist_td_nbr' >156.25</td>
<td class='htmllist_td_nbr' >8</td>
<td class='htmllist_td' >NehaA5.12</td>
</tr>

想在&lt;tr&gt;&lt;/tr&gt;标签之间找到NehaA然后改变

`<tr class='htmllist_tr'>` 

<tr class='htmllist_tr' style="background-color:yellow"> 

`<tr class='htmllist_tr' style="background-color:red">` *

试过了

sed -e "/NehaA/ s/\'<tr class='htmllist_tr'>\'/\'<tr class='htmllist_tr' style="background-color:red">\'/ ;" 2932_TABLE2.txt

没有用,请帮忙

【问题讨论】:

  • 在 awk/sed 中做这件事不是最好的主意。为什么不使用 Python+Beautiful Soup?在 perl 中你可以使用(我想,我以前没用过)HTML::Parser.
  • HTML 和 XML 一样,是结构化数据。你不能把它当作一个普通的文本文件。 many modules available 会解析你的 HTML 并允许你修改它。

标签: linux perl awk sed


【解决方案1】:

如果您使用 HTML 解析器没有得到可用的答案,请尝试以下操作:

$ awk -v RS='</tr>\\s*' '/Neha/{ORS=RT; sub(/<tr[^>]+>/,""); print "<tr class=\047htmllist_tr\047 style=\"background-color:red\">" $0}' file
<tr class='htmllist_tr' style="background-color:red"><td class='htmllist_td' >INDX01</td>
<td class='htmllist_td_nbr' >964.87</td>
<td class='htmllist_td_nbr' >95.13</td>
<td class='htmllist_td' >NehaA9.86</td>
</tr>
<tr class='htmllist_tr' style="background-color:red"><td class='htmllist_td' >UNDOTBS1</td>
<td class='htmllist_td_nbr' >156.25</td>
<td class='htmllist_td_nbr' >8</td>
<td class='htmllist_td' >NehaA5.12</td>
</tr>

它将 GNU awk 用于多字符 RS 和 RT。

【讨论】:

  • 非常感谢它只显示那些有 NehaA 的记录
  • 在 和下一个 中也没有换行符
  • 您发布的示例输入/输出真实反映您真实的示例输入/输出非常重要,因此当我们测试潜在解决方案时,我们可以一眼看出它是否符合您的要求。 idk 你的意思是there is no line break in &lt;/tr&gt; and next &lt;tr class&gt; - 你是说你的输入错误还是我的输出错误或其他什么?请编辑您的问题以显示演示问题的示例输入/输出。
【解决方案2】:

这是我使用HTML::TreeBuilder 的方式。代码本身是不言自明的。我建议您阅读文档,因为不建议解析 HTML using regex

#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;

my $str = <<'HTML'
<html>
<head>
</head>
<body>
<table>
<tr class='htmllist_tr' style="background-color:yellow" >
<td class='htmllist_td' >INDX01</td>
<td class='htmllist_td_nbr' >964.87</td>
<td class='htmllist_td_nbr' >95.13</td>
<td class='htmllist_td' >NehaA9.86</td>
</tr>
<tr class='htmllist_tr' >
<td class='htmllist_td' >UNDOTBS1</td>
<td class='htmllist_td_nbr' >156.25</td>
<td class='htmllist_td_nbr' >8</td>
<td class='htmllist_td' >NehaA5.12</td>
</tr>
</table>
</body>
</html>
HTML
;


my $root = HTML::TreeBuilder->new_from_content($str);

my @tr = $root -> find_by_tag_name('tr');

foreach (@tr) {
    if ($_ -> find_by_attribute("class","htmllist_tr")) {
       my @tds = $_ -> look_down(_tag => 'td', class => 'htmllist_td');
       my @children = map {$_ -> content_list} @tds;
       if(grep(/NehaA/, @children)) {
           $_ -> attr('style', 'background-color:red');
       }
    }
}

print $root -> as_HTML(undef, "  ");

【讨论】:

  • 谢谢 Arunesh。我从来没有用过 HTML::TreeBuilder 会试试这个
【解决方案3】:

@ED ..抱歉混淆了 ..这是原始文件

<table  class='htmllist'>
<tr class='htmllist_tr' ><th class='htmllist_th' >TABLESPACE<br>NAME</th>
<th class='htmllist_th' >ALLOCATED<br>SPACE<br>GB</th>
<th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>GB</th>
<th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>PCT</th>
<tr class='htmllist_tr' style="background-color:yellow" ><td class='htmllist_td' >INDX01</td>
<td class='htmllist_td_nbr' >964.87</td>
<td class='htmllist_td_nbr' >95.78</td>
<td class='htmllist_td' >NehaA9.93</td>
</tr>
<tr class='htmllist_tr' ><td class='htmllist_td' >TEMP</td>
<td class='htmllist_td_nbr' >125</td>
<td class='htmllist_td_nbr' >124.63</td>
<td class='htmllist_td_nbr' >99.7</td>
</tr>
<tr class='htmllist_tr' ><td class='htmllist_td' >TEMP_EDDDATA</td>
<td class='htmllist_td_nbr' >205.99</td>
<td class='htmllist_td_nbr' >198.52</td>
<td class='htmllist_td_nbr' >96.37</td>
</tr>
<tr class='htmllist_tr' ><td class='htmllist_td' >UNDOTBS1</td>
<td class='htmllist_td_nbr' >156.25</td>
<td class='htmllist_td_nbr' >22.85</td>
<td class='htmllist_td' >NehaA14.62</td>
</tr>
</table>

我想要这样的输出

<table  class='htmllist'>
<tr class='htmllist_tr' ><th class='htmllist_th' >TABLESPACE<br>NAME</th>
<th class='htmllist_th' >ALLOCATED<br>SPACE<br>GB</th>
<th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>GB</th>
<th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>PCT</th>
<tr class='htmllist_tr' style="background-color:red" ><td class='htmllist_td' >INDX01</td>
<td class='htmllist_td_nbr' >964.87</td>
<td class='htmllist_td_nbr' >95.78</td>
<td class='htmllist_td' >NehaA9.93</td>
</tr>
<tr class='htmllist_tr' ><td class='htmllist_td' >TEMP</td>
<td class='htmllist_td_nbr' >125</td>
<td class='htmllist_td_nbr' >124.63</td>
<td class='htmllist_td_nbr' >99.7</td>
</tr>
<tr class='htmllist_tr' ><td class='htmllist_td' >TEMP_EDDDATA</td>
<td class='htmllist_td_nbr' >205.99</td>
<td class='htmllist_td_nbr' >198.52</td>
<td class='htmllist_td_nbr' >96.37</td>
</tr>
<tr class='htmllist_tr' style="background-color:red"><td class='htmllist_td' >UNDOTBS1</td>
<td class='htmllist_td_nbr' >156.25</td>
<td class='htmllist_td_nbr' >22.85</td>
<td class='htmllist_td' >NehaA14.62</td>
</tr>
</table>

但是当我使用这个时

awk -v RS='</tr>\\s*' '/Neha/{ORS=RT; sub(/<tr[^>]+>/,""); print "<tr class=\047htmllist_tr\047 style=\"background-color:red\">" $0}' text.txt

这给了我这样的输出

<tr class='htmllist_tr' style="background-color:red"><table  class='htmllist'>
<th class='htmllist_th' >TABLESPACE<br>NAME</th>
<th class='htmllist_th' >ALLOCATED<br>SPACE<br>GB</th>
<th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>GB</th>
<th class='htmllist_th' >CURRENT<br>FREE<br>SPACE<br>PCT</th>
<tr class='htmllist_tr' style="background-color:yellow" ><td class='htmllist_td' >INDX01</td>
<td class='htmllist_td_nbr' >964.87</td>
<td class='htmllist_td_nbr' >95.78</td>
<td class='htmllist_td' >NehaA9.93</td>
</tr><tr class='htmllist_tr' style="background-color:red">
<td class='htmllist_td' >UNDOTBS1</td>
<td class='htmllist_td_nbr' >156.25</td>
<td class='htmllist_td_nbr' >22.85</td>
<td class='htmllist_td' >NehaA14.62</td>
</tr>

让我知道这是否有意义

【讨论】:

  • 请将这些详细信息添加到您的问题而不是答案中。其次,你可以试试HTML::TreeBuilder。它会给你预期的输出。用regex解析html不是一个好习惯;它会在某个地方断裂。如果您有任何疑问,请告诉我。
猜你喜欢
相关资源
最近更新 更多
热门标签