【发布时间】:2017-01-07 12:01:39
【问题描述】:
我想展示 10 家从用户角度来看最好的酒店。假设用户将输入“pool”,那么我必须计算来自 tripadvisor 的用户评论中的关键字“pool”,然后进行计数并根据计数显示前 10 家酒店名称。为此,我目前正在报废酒店(迪拜)的所有评论,然后我将匹配关键字并显示前 10 家酒店名称。但是酒店评论报废花费了太多时间我能做什么?有什么帮助吗?除了抓取还有其他方法吗?这是我用于抓取评论的代码:
import requests
from bs4 import BeautifulSoup
offset = 0
url = 'https://www.tripadvisor.com/Hotels-g295424-oa' + str(offset) + '-Dubai_Emirate_of_Dubai-Hotels.html#EATERY_LIST_CONTENTS'
urls = []
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
for link in soup.find_all('a', {'last'}):
page_number = link.get('data-page-number')
last_offset = int(page_number) * 30
print('last offset:', last_offset)
for offset in range(0, last_offset, 30):
print('--- page offset:', offset, '---')
url = 'https://www.tripadvisor.com/Hotels-g295424-oa' + str(offset) + '-Dubai_Emirate_of_Dubai-Hotels.html#EATERY_LIST_CONTENTS'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
for link in soup.find_all('a', {'property_title'}):
iurl='https://www.tripadvisor.com/' + link.get('href')
r = requests.get(iurl)
soup = BeautifulSoup(r.content, "lxml")
#look for the partial entry of the review
resultsoup = soup.find_all("p", {"class" : "partial_entry"})
for review in resultsoup:
review_list = review.get_text()
print(review_list)
【问题讨论】:
标签: python-3.x web-scraping beautifulsoup tripadvisor