【发布时间】:2014-11-13 19:56:21
【问题描述】:
我有类似于以下的 XML 文件:
<?xml version="1.0" encoding="UTF-8"?>
<OnlineCommentary>
<doc docid="cnn_210085_comment002" articleURL="http://www.cnn.com/News.asp?NewsID=210085" date="10/07/2010" time="00:21" subtitle="Is Justin Bieber getting special treatment?" author="Zorro75">
<seg id="1"> They are the same thing. Let's shoot them both. </seg>
</doc>
<doc docid="cnn_210092_comment004" articleURL="http://www.cnn.com/News.asp?NewsID=210092" date="06/04/2010" time="17:07" subtitle="Dear Chicago, we love you despite it all" author="MRL1313">
<seg id="1"> We can't wait for you to move back either. </seg>
<seg id="2"> You seem quite uptight. </seg>
<seg id="3"> Does your wife (who is also your sister) not give it up any more? </seg>
</doc>
</OnlineCommentary>
我想对该文件执行命令以仅提取开始标签<seg ...>和结束标签</seg>之间的连接
我试过了:
sed -n 's:.*<seg id="1">\(.*\)</seg>.*:\1:p' XML-file.xml > output.txt
我的问题如下:
-- 如何打印所有<seg id="*">??我的命令只打印第一个标签的内容 (<seg id="*">)
-- 有没有一种方法可以用来使例如<seg id="1">、<seg id="2">、<seg id="3"> 打印在同一行,而仅包含 <seg id="1"> 的标签将打印在单独的行中??
【问题讨论】: