【问题标题】:Python 3.5 - How to do a webscraping of a javascript rendered pagePython 3.5 - 如何对 javascript 呈现的页面进行网页抓取
【发布时间】:2018-08-09 23:12:48
【问题描述】:

我正在尝试使用 python 3 和 webdriver 提取 javascript 呈现的表。

我的代码如下:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.TAG_NAME, "table")))

# And grab the page HTML source
html = driver.page_source
driver.quit()
print(html)

现在,当我打印正文时,我的打印中不存在 javascript 呈现的内容。 如何提取我想要的表格(表格的整个 html 代码)?

非常感谢

【问题讨论】:

    标签: javascript python selenium web-scraping webdriver


    【解决方案1】:

    我为解决您的问题所做的是使用 Beautifulsoup 库来解析源代码。

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    
    import bs4
    
    # Create a new instance of the Firefox driver
    driver = webdriver.Firefox()
    driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
    driver.refresh()
    # Wait for the dynamically loaded elements to show up
    WebDriverWait(driver, 10).until(
        EC.visibility_of_element_located((By.TAG_NAME, "table")))
    
    # And grab the page HTML source
    html = driver.page_source
    
    # Turns html into a beautifulsoup object
    bs4_html = bs4.BeautifulSoup(html, 'lxml')
    
    # Finds the table
    table = bs4_html.find_all('table')
    
    driver.quit()
    
    print(table)
    

    控制台输出有一英里长,所以我不能把它放在这里。

    希望有帮助!

    【讨论】:

      猜你喜欢
      • 2018-01-12
      • 2018-06-12
      • 1970-01-01
      • 2020-03-06
      • 2011-12-24
      相关资源
      最近更新 更多