刮板不从动态网页返回结果答案

【问题标题】：Scraper not returning results from dynamic webpage刮板不从动态网页返回结果
【发布时间】：2020-10-29 05:07:25
【问题描述】：

我正在尝试从https://store.steampowered.com/newshub/app/1145360 中抓取所有更新说明。我用“eventcalendar_CalendarRow_398u2”类标识了更新说明，并编写了如下代码：

updatenotes = soup.find_all("div", attrs={"class":"eventcalendar_CalendarRow_398u2"})
for updatenote in updatenotes:

但是当我尝试抓取时，它不会返回任何结果，我认为这是由于网站的动态特性。在开始抓取之前，我正在使用 Selenium 完全向下滚动，但它不起作用。有人可以帮忙吗？

【问题讨论】：

标签： python selenium beautifulsoup steam scrape

【解决方案1】：

试试下面的

driver.get('https://store.steampowered.com/newshub/app/1145360')
scroll_pause_time = 1
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    updatenotes=driver.find_elements_by_css_selector("div.eventcalendar_CalendarRow_398u2")
    print(len(updatenotes))
    for updatenote in updatenotes:
        print(updatenote.text)
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(scroll_pause_time)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        # If heights are the same it will exit the function
        break
    last_height = new_height

【讨论】：

谢谢，这很酷，但似乎当我试图从有很多条目的页面（对于其他游戏）中抓取时，它只返回页面底部的值，而不是整个页面。我认为滚动在页面上有点笨拙，也许这就是问题所在。有什么想法吗？
好的，试试这个。
所以这是滚动到底部的代码，对吧？问题是，一旦你向下滚动到底部然后抓取，代码只会抓取页面底部的数据，而不会抓取网站顶部的数据。
如果我不滚动，我只会从顶部获取数据。所以基本上看起来刮板只能刮掉屏幕附近显示的东西，这很奇怪。