【问题标题】:How to scrape a canvas tag and why is it not visible in my browser?如何抓取画布标签,为什么它在我的浏览器中不可见?
【发布时间】:2025-12-18 19:55:01
【问题描述】:

This image has the highlighted html content and red circle is the portion that needs to be scraped 电话号码在画布标签中。我尝试抓取标签,但它返回“您的浏览器不支持 HTML5 画布标签。”

https://www.mudah.my/malaysia/cars-for-sale/audi?o=1

这是一个链接,其中包含必须抓取联系人的汽车列表 任何关于如何解决此问题的建议表示赞赏。

for link in car_links:
    print('link: ', link)
    driver.get(link)

    try:
        dealer_name = driver.find_element_by_xpath('/html/body/div[1]/div[6]/div/div[2]/div[1]/div[4]/div/div[1]/div[2]/div[1]/a').text
        print(dealer_name)
        try:
            driver.execute_script("arguments[0].scrollIntoView(true);",WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '/html/body/div[1]/div[6]/div/div[2]/div[1]/div[5]/button[2]'))))
            button1 = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(
                (By.XPATH, '/html/body/div[1]/div[6]/div/div[2]/div[1]/div[5]/button[2]')))
            button1.click()
            phone = driver.find_element_by_id('phone-image').text
            print(phone)
        except:
            print('No name')
            print('No phone no')
    except:
        pass

【问题讨论】:

    标签: python html selenium web-scraping


    【解决方案1】:

    您试图在单击前一个元素后立即获取元素文本。电话号码需要一些时间才能出现在那里。因此,您只需要在此之前添加一些等待或延迟。像这样:

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    wait = WebDriverWait(driver, 20)
    
    button1.click()
    phone = wait.until(EC.visibility_of_element_located((By.ID, 'phone-image'))).text
    

    【讨论】:

      【解决方案2】:

      电话号码以 Json 形式存储在页面中。要获取电话号码,您可以:

      import json
      import requests
      from bs4 import BeautifulSoup
      
      url = "https://www.mudah.my/Audi+RS6+4+0+AVANT+TFSI+QUATTRO+Unreg+2016-87091288.htm"
      
      soup = BeautifulSoup(requests.get(url).content, "html.parser")
      data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
      # uncomment this to print all data:
      # print(json.dumps(data, indent=4))
      ad_id = soup.select_one("[gravity-itemid]")["gravity-itemid"]
      
      ad_data = data["props"]["initialState"]["adDetails"]["byID"][ad_id]
      
      print("Phone:", ad_data["attributes"]["phone"])
      

      打印:

      Phone: 0183888798
      

      【讨论】:

      • 知道如何在 selenium 中做到这一点吗?
      • @trialaccount 您可以使用 selenium 加载页面,然后将源提供给 beautifulsoup。脚本将是相同的。
      • 我发现很难实现
      • 我在soup => soup = BeautifulSoup(driver, "html.parser")中做了以下更改
      • 但我收到一个错误提示 TypeError: object of type 'WebDriver' has no len()
      最近更新 更多