【问题标题】:Scraping Google App All Reviews using Selenium and Python使用 Selenium 和 Python 抓取 Google App 所有评论
【发布时间】:2022-01-16 04:21:18
【问题描述】:

我想从 Google Play 商店中抓取特定应用的所有评论。我准备了以下脚本:

# App Reviews Scraper
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from bs4 import BeautifulSoup

url = "https://play.google.com/store/apps/details?id=com.android.chrome&hl=en&showAllReviews=true"

# make request
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
time.sleep(SCROLL_PAUSE_TIME)

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")

    if new_height == last_height:
        break
    last_height = new_height

# Get everything inside <html> tag including javscript
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
soup = BeautifulSoup(html, 'html.parser')

reviewer = []
date = []

# review text
for span in soup.find_all("span", class_="X43Kjb"):
    reviewer.append(span.text)

# review date
for span in soup.find_all("span", class_="p2TkOb"):
    date.append(span.text)

print(len(reviewer))
print(len(date))

但是,它始终只显示 203。有 35,474,218 条评论。那么,如何下载所有评论?

【问题讨论】:

  • 如果您检查完整的行为,您会发现您还必须查看是否不时出现 show more 链接。

标签: python selenium web-scraping


【解决方案1】:
wait=WebDriverWait(driver,1)


try:
    wait.until(EC.element_to_be_clickable((By.XPATH,"//span[text()='Show More']"))).click()
except:
    continue

只需添加它以检查无限滚动中的显示更多元素。

进口:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

【讨论】:

  • '@Arundeep' 非常感谢,它工作正常。现在面临另一个问题:浏览器中有时会显示错误,并且卡住(无法向下滚动,但等待无限时间)。如果我手动关闭浏览器,它不会显示任何评论。如果发生任何错误,是否有任何选项可以自动关闭浏览器,并且会显示直到那时的评论?
  • driver.close() 用于关闭,您可以将抓取元素放入向下滚动中。
猜你喜欢
  • 2021-04-03
  • 1970-01-01
  • 2021-08-29
  • 1970-01-01
  • 1970-01-01
  • 2022-01-10
  • 1970-01-01
  • 2021-10-11
  • 2019-10-16
相关资源
最近更新 更多