【问题标题】:Try to extract paragraph using beautiful soup尝试使用美汤提取段落
【发布时间】:2022-11-30 05:22:47
【问题描述】:
from selenium import webdriver
import time
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait


options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1920x1080")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
                    
URL = 'https://www.askgamblers.com/online-casinos/countries/uk'
driver.get(URL)
time.sleep(2)
urls= []
page_links =driver.find_elements(By.XPATH, "//div[@class='card__desc']//a[starts-with(@href, '/online')]")
for link in page_links:
    href=link.get_attribute("href")
    urls.append(href)
    #print(href)

for url in urls:
    driver.get(url)
    time.sleep(1)
    try:
        review=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='review-main__show']")))
        review.click()
    except:
        pass
    
    soup = BeautifulSoup(driver.page_source,"lxml")

    try:
        paragraph=soup.select_one("h2:-soup-contains('Virtual Games')").nextSibling.textContent
        print(paragraph)
    except:
        print('empty')
        pass

Detail:-

我正在尝试提取这些段落,但当您单击 read more 时,他们给我 none 然后您会看到整段这些是页面链接 https://www.askgamblers.com/online-casinos/reviews/mr-play-casino 这些是我不会提取的整段

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:
    猜你喜欢
    • 2014-11-03
    • 1970-01-01
    • 2016-09-11
    • 2018-12-06
    • 1970-01-01
    • 2015-03-18
    • 2012-08-01
    • 2017-12-11
    • 1970-01-01
    相关资源
    最近更新 更多