【问题标题】:Xpath help needed on XML outputXML 输出需要 Xpath 帮助
【发布时间】:2018-06-26 10:13:49
【问题描述】:

我尝试使用Xpath 来获取DataTable 标头。

我的输出应该是:

ItemNum |项目|的ResultCode |状态| ExtBackLinks | RefDomains | AnalysisResUnitsCost | ACRank |的ItemType | IndexedURLs | GetTopBackLinksAnalysisResUnitsCost | DownloadBacklinksAnalysisResUnitsCost | DownloadRefDomainBacklinksAnalysisResUnitsCost | RefIPs | RefSubNets | RefDomainsEDU | ExtBackLinksEDU | RefDomainsGOV | ExtBackLinksGOV | RefDomainsEDU_Exact | ExtBackLinksEDU_Exact | RefDomainsGOV_Exact | ExtBackLinksGOV_Exact | CrawledFlag | LastCrawlDate | LastCrawlResult | RedirectFlag | FinalRedirectResult | OutDomainsExternal | OutLinksExternal | OutLinksInternal | OutLinksPages | LastSeen |标题| RedirectTo |语言LanguageDesc | LanguageConfidence | LanguagePageRatios | LanguageTotalPages | RefLanguage | RefLanguageDesc | RefLanguageConfidence | RefLanguagePageRatios | RefLanguageTotalPages | CrawledURLs | RootDomainIPAddress | TotalNonUniqueLinks | NonUniqueLinkTypeHomepages |NonUniqueLinkTypeIndirect|NonUniqueLinkTypeDeleted|NonUniqueLinkTypeNoFollow|NonUniqueLinkTypeProtocolHTTPS|NonUniqueLinkTypeFrame|NonUniqueLinkTypeImageLink|NonUniqueLinkTypeRedirect|NonUni queLinkTypeTextLink|RefDomainTypeLive|RefDomainTypeFollow|RefDomainTypeHomepageLink|RefDomainTypeDirect|RefDomainTypeProtocolHTTPS|CitationFlow|TrustFlow|TrustMetric|TopicalTrustFlow_Topic_0|TopicalTrustFlow_Value_0|TopicalTrustFlow_Topic_1|TopicalTrustFlow_Value_1|TopicalTrustFlow_Topic_2|TopicalTrustFlow_Value_2

这是原始的 XML:

<Result Code="OK" ErrorMessage="" FullError="">
<GlobalVars FirstBackLinkDate="2012-09-21" IndexBuildDate="2018-05-24 19:47:18" IndexType="0" MostRecentBackLinkDate="2018-04-23" QueriedRootDomains="1" QueriedSubDomains="0" QueriedURLs="0" QueriedURLsMayExist="0" ServerBuild="2018-06-11 13:52:01" ServerName="BRUNO28" ServerVersion="1.0.6736.23160" UniqueIndexID="20180524194718-HISTORICAL"/>
<DataTables Count="1">
<DataTable Name="Results" RowsCount="1" Headers="ItemNum|Item|ResultCode|Status|ExtBackLinks|RefDomains|AnalysisResUnitsCost|ACRank|ItemType|IndexedURLs|GetTopBackLinksAnalysisResUnitsCost|DownloadBacklinksAnalysisResUnitsCost|DownloadRefDomainBacklinksAnalysisResUnitsCost|RefIPs|RefSubNets|RefDomainsEDU|ExtBackLinksEDU|RefDomainsGOV|ExtBackLinksGOV|RefDomainsEDU_Exact|ExtBackLinksEDU_Exact|RefDomainsGOV_Exact|ExtBackLinksGOV_Exact|CrawledFlag|LastCrawlDate|LastCrawlResult|RedirectFlag|FinalRedirectResult|OutDomainsExternal|OutLinksExternal|OutLinksInternal|OutLinksPages|LastSeen|Title|RedirectTo|Language|LanguageDesc|LanguageConfidence|LanguagePageRatios|LanguageTotalPages|RefLanguage|RefLanguageDesc|RefLanguageConfidence|RefLanguagePageRatios|RefLanguageTotalPages|CrawledURLs|RootDomainIPAddress|TotalNonUniqueLinks|NonUniqueLinkTypeHomepages|NonUniqueLinkTypeIndirect|NonUniqueLinkTypeDeleted|NonUniqueLinkTypeNoFollow|NonUniqueLinkTypeProtocolHTTPS|NonUniqueLinkTypeFrame|NonUniqueLinkTypeImageLink|NonUniqueLinkTypeRedirect|NonUniqueLinkTypeTextLink|RefDomainTypeLive|RefDomainTypeFollow|RefDomainTypeHomepageLink|RefDomainTypeDirect|RefDomainTypeProtocolHTTPS|CitationFlow|TrustFlow|TrustMetric|TopicalTrustFlow_Topic_0|TopicalTrustFlow_Value_0|TopicalTrustFlow_Topic_1|TopicalTrustFlow_Value_1|TopicalTrustFlow_Topic_2|TopicalTrustFlow_Value_2" MaxTopicsRootDomain="30" MaxTopicsSubDomain="20" MaxTopicsURL="10" TopicsCount="3">
<Row>
0|nu.nl|OK|Found|508322106|165344|508322106|-1|1|4149991|5000|512472097|3356880|59147|26204|233|3613|43|308|73|1757|4|12|False| | |True| |5|10|44|1722150| |NU - Het laatste nieuws het eerst op NU.nl|https://www.nu.nl/|nl|Dutch/Flemish|92|99.9|482980|nl,en,de|Dutch/Flemish,English,German|87,93,58|96.5,3.1,0.1|76319583|1915923|52.85.201.19|611833777|15034990|53120677|444371798|95283418|52384870|388104|53497551|5655999|552292123|102171|115787|21952|150164|49554|76|70|70|News/Breaking News|69|Sports/Resources|45|Arts/Radio|43
</Row>
</DataTable>
</DataTables>
</Result>

当我在 Google 表格 中使用 Xpath 命令时:

=importxml("http://enterprise.majesticseo.com/api_command?privatekey=xxx&accessToken=xxx&cmd=GetIndexItemInfo&item0=nu.nl&items=1","//DataTable"

我得到了 Row 结果。这很棒,但我还需要工作表第一行中的标题名称。

【问题讨论】:

  • 您在问题中提到的输出应使用以下 XPath 表达式获得://DataTable/@Headers
  • 谢谢你,它有效!

标签: xml xpath google-sheets


【解决方案1】:

XPath 简介:-)

使用//DataTable,您将在 XML 中的任何位置获得任何&lt;DataTable&gt;(此处不涉及命名空间)的完整节点。
根据经验,最好尽可能具体(而不是使用/Result/DataTables/DataTable)。但这不是您问题的答案...

想象一下这样的 XML:

<root>
  <innerNode attr="1"><a>Some a content</a><b>Some b content</b></innerNode>
  <innerNode attr="2"><a>aaa</a><b>bbb</b></innerNode>
</root>

使用/root/innerNode,您将获得&lt;innerNode&gt; 的所有内容。

使用/root/innerNode[(b/text())[1]="bbb"],您只会得到一个&lt;innerNode&gt;,其中&lt;b&gt;text()"bbb"

使用/root/innerNode[@attr="1"],您将获得&lt;innerNode&gt;,其中属性attr 的值为“2”。

所有三个XPath 样本都带回了完整节点,包括子节点、属性等等。

如果你只想要一个属性的值,你必须要求它:

(/root/innerNode/@attr)[2] 

... 返回第二个&lt;innerNode&gt; 的属性值(实际上是第二次出现)

/root/innerNode[(b/text())[1]="Some b content"]/@attr

...返回&lt;innerNode&gt; 的属性值,其中&lt;b&gt; 具有text() 0f "Some b content"

回到你的问题

您想读取位于/Result/DataTables 的元素&lt;DataTable&gt; 中的属性Headers。只需使用

/Result/DataTables/DataTable/@Headers

【讨论】:

  • 我想问 XML Ninja 一个问题。关于stackoverflow.com/questions/51030750/… XML Parse 方法是否存在改变序列的风险?
  • @JohnCappelletti 在那里评论 :-)
  • 哇,感谢您的帮助和解释。这非常有用,而且有效!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-10-23
  • 1970-01-01
  • 2012-01-30
  • 1970-01-01
相关资源
最近更新 更多