【发布时间】:2020-10-12 08:59:23
【问题描述】:
所以我正在做一个小项目,我在其中抓取特定公司的雅虎财经新闻,并对其进行一些数据分析,以了解新闻情绪如何影响股票表现。我正在尝试无限地刮擦和滚动直到它停止,但是我在尝试刮过第一个滚动时遇到了麻烦。
我正在使用 selenium 来帮助我解决这个问题。我一直在到处寻找帮助,但似乎是因为每次向下滚动时都会逐渐加载新闻结果,这会使事情变得更加复杂。
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
# Web scrapper for infinite scrolling page
url = "https://finance.yahoo.com/quote/company/press-releases?p=company"
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
time.sleep(2) # Allow 2 seconds for the web page to open
scroll_pause_time = 2
screen_height = driver.execute_script("return window.screen.height;") # get the screen height of the web
i = 1
SCROLL_PAUSE_TIME = 0.5
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(SCROLL_PAUSE_TIME)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
##### Extract Article Titles #####
titles = []
soup = BeautifulSoup(driver.page_source, "html.parser")
for t in soup.find_all(class_="Cf"):
a_tag = t.find("a", class_="Fw(b)")
if a_tag:
text = a_tag.text
titles.append(text)
【问题讨论】:
标签: python selenium web-scraping sentiment-analysis yahoo-finance