使用 sed 获取 XML 中标签第 n 次出现的标签值答案

【问题标题】：Get TagValue of nth occurence of a Tag in XML using sed使用 sed 获取 XML 中标签第 n 次出现的标签值
【发布时间】：2015-01-01 22:51:18
【问题描述】：

我的 xml

<?xml version="1.0" encoding="UTF-8" ?>
<Attributes>
   <Attribute>123</Attribute>
   <Attribute>959595</Attribute>
   <Attribute>1233</Attribute>
   <Attribute>jiji</Attribute>
</Attributes>

我需要使用 sed 获取属性标签第二次出现的标签值，即 959595

我使用了命令

sed -n ':a;$!{N;ba};s#\(<Attribute\)\(.*\)\(</Attribute>\)#\1#2#\2#p' file

模式一秒出现模式两个值不起作用

我不知道我的方法是否正确，请更正我的命令

【问题讨论】：

我宁愿使用支持 XML 的命令行工具 - 例如stackoverflow.com/questions/91791/…

标签： sed

【解决方案1】：

正确的做法是：

$ xmllint --xpath '/Attributes/Attribute[2]/text()' file.xml

注意事项

xmllint 附带libxml2。
“2”是第二个搜索的元素

【讨论】：

【解决方案2】：

sed -n '/<Attributes>/,\#</Attributes># {
  /<Attribute>/ {
     H;g
     s#.*<Attribute>\(.*\)</Attribute>.*#\1#
     t found
     }
   b
:found
   p;q
   }' YourFile

假设像您的示例一样，只有 1 个要找到的属性，这个 sed 只返回第一个。（如果 xml 内容只是像您的示例，则不需要选择 /<Attributes>/,\#</Attributes>#）
在 GNU sed 上的 Posix 版本 so --posix

【讨论】：

如果你放一个 n 你会得到下一个条目（我认为它是 OP 想要的）： sed -n '//,\## { //{ H;g;n; s#.*(.*).*#\1# t found } b :found p;q }' attrib.txt
不一样。 n 将在没有条件的情况下将下一行添加到工作缓冲区（除非最后一行），这可能是以外的其他内容，这就是我过滤和使用保持缓冲区的原因
我在我的系统上对其进行了测试，它可以工作，因为您已经为语句添加了 /Attribute/ 寻址，因此您可能不需要太多其他东西。
使用<Attribute>1</Attribute><OtherTag>2</OtherTag><Attribute>3</Attribute> 或简单地使用<Attributes><Attribute>1</Attribute></Attributes> 内容进行测试（所有标签之间都有新行）
我使用了来自 OP 的测试数据，他想要第二个属性条目而不是第一个。

【解决方案3】：

这个 sed 打印 Attributes 块中的所有 Attribute 条目，然后获取第二个条目并删除标签：

sed -n '/<Attributes>/,\#</Attributes>#{/<Attribute>/p}' attrib.txt | sed -n '2p' | sed 's#</Attribute>##;s/<Attribute>//'

Output: 
   959595

或者另一种不使用管道的方法是使用 sed 命令，这会转到第二个条目剥离 Attribute 标记然后退出：

sed -n '/<Attributes>/,\#</Attributes>#{/<Attribute>/{n;s#.*<Attribute>\(.*\)</Attribute>.*#\1#;p;q};}' attrib.txt

或者，如果您的属性条目数量发生变化，您可以通过解析所有值然后使用 sed 打印所需的属性位置来使其更直观：

sed -n '/<Attributes>/,\#</Attributes>#{/<Attribute>/{s#</Attribute>##;s#<Attribute>##;p}}' attrib.txt | sed -n '2p'

您可以将 end where 从 2 更改为要显示的任何属性值字段，或采用多个值，例如 sed -n '2p;3p' 或 sed -n '1,2p'

【讨论】：

【解决方案4】：

我也会遵循 xmllint xpath 方式。然而，似乎有两个版本可用。根据https://linux.die.net/man/1/xmllint 的这个手册页，没有 xpath 参数，但它被称为“模式”。

按照本文档，您的电话将是

$ xmllint --pattern '/Attributes/Attribute[2]/text()' file.xml

我建议您查看您当地的手册页以了解使用哪一个。

【讨论】：