如何从anckor标签的html链接中提取新闻文章

【问题标题】：How to extract news article from html links of anckor tags如何从anckor标签的html链接中提取新闻文章
【发布时间】：2020-06-11 06:02:17
【问题描述】：

谁能帮我提取以下标签中可用的新闻文本。

<a href="tigrinya/news-50612332.html" class="faux-block-link__overlay-link" tabindex="-1" aria-hidden="true"> ሕሉፍ ወልፊ ሞባይል፡ ንመንእሰያት ራዕዲ ከምዝፈጥረሎም ተገሊጹ</a>" and "
<a href="tigrinya/news-50605565.html" class="title-link">
  <h3 class="title-link__title">
    <span class="title-link__title-text">ሃገራዊ ቦርድ መረጻ ኢትዮጵያ ንብልጽግና ፓርቲ ኣይመዝገብኩዎን ኢሉ</span>
  </h3>
</a>

【问题讨论】：

你能给我看看你的scrapy代码吗

标签： javascript python html scrapy summarization

【解决方案1】：

使用python BeautifulSoup库解析HTML数据

从 bs4 导入 BeautifulSoup

data = """ ሕሉፍ ወልፊ ሞባይል፡ ንመንእሰያት ራዕዲ ከምዝፈጥረሎም ተገሊጹ"和" ሃገራዊ ቦርድ መረጻ ኢትዮጵያ ንብልጽግና ፓርቲ ኣይመዝገብኩዎን ኢሉ """

soup = BeautifulSoup(data, 'lxml')

print(soup.find('span', {'class': 'title-link__title-text'}).text)

【讨论】：

感谢您的及时回答，可能会有疑问。只是为了明确，我想要的是提取实际的新闻段落。因为它只给我标签垃圾邮件下的磁贴。
我使用的代码是“html = urllib.request.urlopen('file:///G:/My%20Web%20Sites/BBC%20Tigrigna/www.bbc.com/tigrinya.html ') 汤 = BeautifulSoup(html) 数据 = soup.findAll(text=True)",