【发布时间】:2020-01-31 19:26:07
【问题描述】:
我只想输出外部的 li 标签文本。
from bs4 import BeautifulSoup
html = BeautifulSoup("""
<ul>
<li><a href="#">B2B Marketing</a>
<ul>
<li><a href="offerings/b2bmarketing/outboundai.php"> Campagin </a></li>
<li><b>Inbound AI </b>Enrich inbound leads</a></li>
</ul>
</li>
<li>Marketing Data Analysis
<ul>
<li><a href="offerings/marketingdataanalysis/event360ai.php"><b>Event 360 AI </b></a></li>
</ul>
</li>
<li class="drop-down"><a href="#">Enrichment API</a>
</li>
</ul>
""")
print([i.text.strip() for i in html.findAll('li')])
输出是 html 内容的整个文本。
['B2B Marketing\n\n Campagin \nInbound AI Enrich inbound leads', 'Campagin', 'Inbound AI Enrich inbound leads', 'Marketing Data Analysis\n \nEvent 360 AI', 'Event 360 AI', 'Enrichment API\n\nAPI Technographics, Firmographics, Intent data', 'API Technographics, Firmographics, Intent data']
但是
输出应该是:-
[
'B2B Marketing : Campagin, Enrich inbound leads',
'Marketing Data Analysis : Event 360 AI',
'Enrichment API'
]
请帮我解决这个问题
【问题讨论】:
-
但是你不只对外部
li元素的文本感兴趣;您请求的输出也是嵌套列表中li元素内容的函数。
标签: python web-scraping beautifulsoup python-requests