网页抓取新闻文章答案

【问题标题】：Web Scraping News Articles网页抓取新闻文章
【发布时间】：2020-03-27 11:57:25
【问题描述】：

我在从以下网站抓取新闻文章标题和文章描述时遇到问题：https://www.hrdive.com/。我尝试的编码不起作用。有人可以帮我修复此编码以使其正常工作吗？

   for i in data.xpath("//li[@class='row feed__item']"):
   title= i.xpath('//h3/a/text()')
   article = i.xpath('//p[@class="feed__description"]/text()')
   print(title, article)

【问题讨论】：

你能比“有问题”和“没有工作”更具体吗？
当我使用这种形式的编码时，它只会给我空白括号。

标签： python web-scraping

【解决方案1】：

你定位的元素仍然嵌套在几个标签中，div > h3 > a，所以你需要使用//来找到它。

for i in data.xpath("//li[@class='row feed__item']"):
   title = i.xpath('//h3/a/text()')
   article = i.xpath('//p[@class='feed__description']/text()')
   print(title, article)

注意开头的双斜杠//

提示：

您可以在浏览器控制台中测试您的 xpath，例如，在您的情况下，您可以转到 https://www.hrdive.com/ 并检查/转到控制台并使用 $x：

$x("//li[@class='row feed__item']//p[@class='feed__description']/text()")

// or

$x("//li[@class='row feed__item']//p[@class='feed__description']")[0].innerText

【讨论】：

非常感谢。现在我只需要消除间距。
@TimothyJefferson 很高兴它成功了，如果您将其标记为答案，我将不胜感激 :)
@TimothyJefferson 我想你只需要点击向上的箭头，我还没有在 SO 中发布问题所以我真的不知道如何，哈哈
我点击了向上的箭头，但我认为它只会在 5 人投票时才会显示。
@TimothyJefferson 我刚找到这个stackoverflow.com/help/someone-answers，如果不起作用，没关系：D