“导入的内容为空。”在 GSheets 中使用 ImportXML 抓取时出错答案

【问题标题】："Imported content is empty." error when scraping with ImportXML in GSheets“导入的内容为空。”在 GSheets 中使用 ImportXML 抓取时出错
【发布时间】：2019-04-02 21:06:04
【问题描述】：

我需要将图片的源 URL 从目录的链接网页中抓取到 Google 表格的列中。

我认为使用IMPORTXML 函数将是最简单的解决方案，但我每次都会收到 #N/A "Imported content is empty." 错误。

我也尝试使用this extension 来定义XPath，但仍然是同样的错误。

页面的源代码，其中图片源URL为：

<div class="centerer" id="rbt-gallery-img-1">
  <i class="spinner">
    <span></span>
  </i>
  <img data-lazy="//i.example.com/01.jpg" border="0"/>
</div>

所以我想将“i.example.com/01.jpg”值设置为 B2，然后是更多图像的 URL 到相邻单元格。

我使用的函数是：

=IMPORTXML(A2,"//img[@class='centerer']/@data-lazy")

我尝试使用微调器而不是中心器，结果相同。

【问题讨论】：

webapps.stackexchange.com/a/126329/186471

标签： xpath web-scraping google-sheets

【解决方案1】：

您可以使用以下 XPath-1.0 表达式获取字符串 i.example.com/01.jpg：

substring-after(//div[@class='centerer']/img/@data-lazy,'//')

如果不需要去掉前导//，则只能使用

//div[@class='centerer']/img/@data-lazy

因此，在第一种情况下，Google-Sheets 表达式可以是

=IMPORTXML(A2,"substring-after(//div[@class='centerer']/img/@data-lazy,'//')")

第二个可能是

=IMPORTXML(A2,"//div[@class='centerer']/img/@data-lazy")

【讨论】：

非常感谢您的回答！不幸的是，这两个函数都返回错误：第一个：无法解析导入的 Xml 第二个：导入的内容为空