【发布时间】:2019-06-30 13:10:18
【问题描述】:
<div class="book-cover-image">
<img alt="NOT IN MY BACKYARD – Solid Waste Mgmt in Indian Cities" class="img-responsive" src="https://cdn.downtoearth.org.in/library/medium/2016-05-23/0.42611000_1463993925_book-cover.jpg" title="NOT IN MY BACKYARD – Solid Waste Mgmt in Indian Cities"/>
</div>
我需要从所有这样的 div 标签中提取这个标题值。执行此操作的最佳方法是什么。请提出建议。
我正在尝试获取this page 上提到的所有书籍的标题。
到目前为止我已经尝试过了:
import requests
from bs4 import BeautifulSoup as bs
url1 ="https://www.downtoearth.org.in/books"
page1 = requests.get(url1, verify=False)
#print(page1.content)
soup1= bs(page1.content, 'html.parser')
class_names = soup1.find_all('div',{'class':'book-cover-image'} )
for class_name in class_names:
title_text = class_name.text
print(class_name)
print(title_text)
【问题讨论】:
-
添加示例输入和所需的输出。
-
cdn.downtoearth.org.in/library/medium/2016-05-23/… " title="不在我的后院——印度城市的固体废物管理"/>
-
输出应该是标题:不在我的后院——印度城市的固体废物管理
-
到目前为止你尝试了什么?
-
url1 ="downtoearth.org.in/books" page1 = requests.get(url1, verify=False) #print(page1.content) soup1= bs(page1.content, 'html.parser') class_names = soup1.find_all('div',{'class':'book-cover-image'} ) for class_name in class_names: title_text = class_name.text print(class_name) #print(title_text)
标签: python html text beautifulsoup tags