在 Google 表格中使用 ImportXML 从网页中抓取署名

【问题标题】：Scraping bylines from web page using ImportXML in Google Sheets在 Google 表格中使用 ImportXML 从网页中抓取署名
【发布时间】：2021-03-03 17:43:00
【问题描述】：

希望从文章中提取作者姓名。当前使用 =IMPORTXML(G2,"//*[@class='author-details']")

当我这样做时，它会在下面创建 4 个包含单词“By”的单元格，我无法摆脱它。

对代码非常陌生 - 我做错了什么？

附例：https://docs.google.com/spreadsheets/d/1Mi1D5G1-_gNsQwVQ6I_ealDqcWixKA2p-hFqJpjlGt4/edit?usp=sharing

【问题讨论】：

标签： google-sheets google-sheets-formula spreadsheet

【解决方案1】：

你可以使用：

=index(IMPORTXML(G2,"//*[@class='author-details']"),1,2)

这仅显示返回内容的第二列的第一行。您所追求的信息。

编辑：

此外，因为您突出显示您想要作者姓名。如果所有名称都采用“By FIRST LAST @TwitterHandle Affiliation”格式，那么您可以使用它来获取作者的姓名：

=trim(split(right(index(IMPORTXML(G2,"//*[@class='author-details']"),1,2),len(index(IMPORTXML(G2,"//*[@class='author-details']"),1,2))-3),"@",true,true))

可能看起来像伏都教，但将其粘贴进去，它可以工作。它删除前 3 个字符（“By”），在“@”符号处分割文本，然后只保留其左侧的文本，即名称。

【讨论】：

非常感谢！这正是我所需要的。