【发布时间】:2014-11-09 09:51:02
【问题描述】:
我正在使用Python 和BeautifulSoup 进行网络抓取。
我需要刮掉这个
<li class="review-rating">
<h5 class="review-rating__title">Location:</h5>
<span class="review-rating__score">5</span>
<h5 class="review-rating__title">Value:</h5>
<span class="review-rating__score">3</span>
<h5 class="review-rating__title">Facilities:</h5>
<span class="review-rating__score">4</span>
<h5 class="review-rating__title">Service:</h5>
<span class="review-rating__score">4</span>
<h5 class="review-rating__title">Cleanliness:</h5>
<span class="review-rating__score">5</span>
</li>
我实际上已经用这段代码刮掉了这个标记
for scores_of_this_customer in tt.select('li.review-rating'):
print(scores_of_this_customer.select('h5.review-rating__title')[0].text +" "+scores_of_this_customer.select('span.review-rating__score')[0].text)
但这只会打印Location: 5
我想要一种使用循环打印所有这些分数的方法。
我知道我可以通过将它们索引为 [1]、[2]... 等来打印其他分数,但我不想写 5 个打印语句
PS:
这段代码对我有用。
if tt.select('li.review-rating'):
soup = tt.select('li.review-rating').find("li", {"class", "review-rating"})
keys = soup.findAll("h5", {"class" : "review-rating__title"})
values = soup.findAll("span", {"class" : "review-rating__score"})
for key, value in zip(keys, values):
print(key.text + ": " + value.text)
【问题讨论】:
标签: python python-3.x beautifulsoup