【发布时间】:2016-12-12 16:37:51
【问题描述】:
有人可以帮我写一个 for 循环来遍历所有这些区域节点并获取唯一的类领导的文本吗?
<zones count="13">
<zone type="RECT" flags="4099" class="Headline" num="1">
<zrect unit="pix">0,1097,2173,1303</zrect>
<ztext type="XML" textformat="XML">
<REGION>
<PARAGRAPH>
<LINE>
<WORD Rect="27,933,272,1067">ma</WORD>
<BLANK/>
<WORD Rect="325,933,820,1096">ekdum</WORD>
<BLANK/>
<WORD Rect="877,933,982,1065">gyani</WORD>
<BLANK/>
<WORD Rect="1040,933,1829,1096">chu</WORD>
<BLANK/>
</LINE>
</PARAGRAPH>
</REGION>
</ztext>
<source/>
</zone>
<zone type="RECT" flags="4099" class="Author" num="2">
<zrect unit="pix">0,1326,324,1372</zrect>
<ztext type="XML" textformat="XML">
<REGION>
<PARAGRAPH>
<LINE>
<WORD Rect="4,1126,44,1158">By</WORD>
<BLANK/>
<WORD Rect="54,1126,131,1151">Sano</WORD>
<BLANK/>
<WORD Rect="145,1126,272,1151">shrest</WORD>
<BLANK/>
</LINE>
</PARAGRAPH>
</REGION>
</ztext>
<source/>
</zone>
<zone type="RECT" flags="4099" class="Lead" num="3">
<zrect unit="pix">0,1384,475,1584</zrect>
<ztext type="XML" textformat="XML">
<REGION>
<PARAGRAPH>
<LINE>
<WORD Rect="5,1174,42,1192">Dherai</WORD>
<BLANK/>
<WORD Rect="55,1178,118,1198">years</WORD>
<BLANK/>
<WORD Rect="130,1178,166,1192">dekhin</WORD>
<BLANK/>
<WORD Rect="179,1174,263,1192">gadi</WORD>
<BLANK/>
<WORD Rect="277,1174,331,1192">banaune</WORD>
<BLANK/>
<WORD Rect="344,1174,399,1192">manche</WORD>
<BLANK/>
</LINE>
<LINE>
<WORD Rect="4,1203,91,1226">haru</WORD>
<BLANK/>
<WORD Rect="115,1203,147,1221">mehanat</WORD>
<BLANK/>
<WORD Rect="172,1207,218,1221">gardai</WORD>
<BLANK/>
<WORD Rect="241,1203,399,1226">chan</WORD>
<BLANK/>
</LINE>
<LINE>
<WORD Rect="3,1236,63,1255">ramro</WORD>
<BLANK/>
<WORD Rect="80,1233,102,1250">gadi</WORD>
<BLANK/>
<WORD Rect="119,1231,214,1255">nirman</WORD>
<BLANK/>
<WORD Rect="232,1231,323,1254">garna</WORD>
<BLANK/>
<WORD Rect="341,1236,400,1250">lai</WORD>
<BLANK/>
</LINE>
</PARAGRAPH>
</REGION>
</ztext>
<source/>
</zone>
<zone type="RECT" flags="4099" class="Paragraph" num="4">
<zrect unit="pix">0,1596,478,2249</zrect>
<ztext type="XML" textformat="XML">
<REGION>
<PARAGRAPH>
<LINE>
<WORD Rect="28,1352,74,1366">Ramro</WORD>
<BLANK/>
<WORD Rect="82,1356,114,1366">gadi</WORD>
<BLANK/>
<WORD Rect="122,1356,151,1369">are,</WORD>
<BLANK/>
<WORD Rect="158,1352,179,1366">for</WORD>
<BLANK/>
<WORD Rect="186,1356,196,1366">a</WORD>
<BLANK/>
<WORD Rect="202,1352,254,1369">variety</WORD>
<BLANK/>
<WORD Rect="262,1352,274,1366">of</WORD>
<BLANK/>
<WORD Rect="283,1356,348,1368">reasons,</WORD>
<BLANK/>
<WORD Rect="356,1352,400,1369">ramro</WORD>
<BLANK/>
</LINE>
</PARAGRAPH>
</REGION>
</ztext>
<source/>
</zone>
我能够获取所有区域的文本,但不是特别使用属性 class= "Lead"
【问题讨论】:
-
你用什么解析xml?
-
我正在使用 xpath。但是我不能给出节点的位置,而结构会因不同的 xml 文件而改变。即,Lead 类在 num=3 中,但在其他 xml 中,Lead 类可以在 num=1 中。
-
根据 w3schools 的说法,如果您使用
//zone[@class='Lead'],您将获得所有具有类 Lead 的区域。然后你可以循环它们以获得你需要的文本。
标签: xml for-loop xml-parsing