Python-网页抓取页面

【问题标题】：Python- WebScraping a pagePython-网页抓取页面
【发布时间】：2021-03-18 02:31:13
【问题描述】：

我的代码应该进入一个网站，浏览 2 个页面，然后打印出每一行中的所有标题和 URL/href。

目前 - 我的代码可以很好地进入这 2 个页面，但是 它只打印出每个页面的第一个标题，而不是每行的每个标题。

该页面确实有一些 JavaScript，我想这可能就是为什么它没有在每一行中显示任何链接/urls/hrefs 的原因？理想的 id 喜欢打印每一行的 URL。

from selenium import webdriver
import time

driver = webdriver.Chrome()

for x in range (1,3):
    driver.get(f'https://www.abstractsonline.com/pp8/#!/9325/presentations/endometrial/{x}')
    time.sleep(3)
    page_source = driver.page_source
    eachrow=driver.find_elements_by_xpath("//li[@class='result clearfix']")
    for item in eachrow:
        title=driver.find_element_by_xpath("//span[@class='bodyTitle']").text
        print(title)

【问题讨论】：

我不知道这是否有帮助，但请检查 realpython 上的 Beautiful Soup Web Scraper -> link
BeautifulSoup 似乎无法解析此页面

标签： python selenium web-scraping beautifulsoup

【解决方案1】：

您在 for 循环中使用 driver 意味着您正在搜索整个页面 - 因此您将始终获得相同的元素。

您想改为从每个 item 中搜索。

for item in eachrow:
    title = item.find_element_by_xpath(".//span[@class='bodyTitle']").text

此外，上述行中没有“URL” - 当您单击一行时，请求中会使用 data-id 属性。

<h1 class="name" data-id="1989" data-key="">

向https://www.abstractsonline.com/oe3/Program/9325/Presentation/694发送请求

【讨论】：

您好，感谢您的回复。我尝试了使用 item.find 而不是 driver.find 的代码行，我得到了空白打印作为输出。你有什么东西印在你的身上吗？
谢谢，有趣的是点细化了搜索