【发布时间】:2021-04-13 21:41:31
【问题描述】:
我在单击“llamar”按钮后尝试提取电话号码时遇到了问题。到目前为止,我已经将 xpath 方法与 selenium 一起使用,并尝试使用美丽的汤来提取数字,但不幸的是没有任何效果。我通常会收到一个无效的选择器错误(如果我使用带有 selenium 的 xpath 选择器)并且使用 BS4 我会得到一个 - AttributeError: 'NoneType' object has no attribute 'text' ... 希望你能帮帮我!
这是我尝试过的代码:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import UnexpectedAlertPresentException
url = 'https://www.milanuncios.com/venta-de-pisos-en-malaga-malaga/portada-alta-carlos-de-haya-carranque - 386352344.htm'
path = r'C:\Users\WL-133\anaconda3\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe'
path1 = r'C:\Users\WL-133\anaconda3\Lib\site-packages\selenium\webdriver\firefox'
# driver = webdriver.Chrome(path)
options = Options()
driver = webdriver.Chrome(path)
driver.get(url)
a = []
mah_div = driver.page_source
soup = BeautifulSoup(mah_div, features='lxml')
cookie_button = '//*[@id="sui-TcfFirstLayerModal"]/div/div/footer/div/button[2]'
btn_press = driver.find_element_by_xpath(cookie_button)
btn_press.click()
llam_button = '//*[@id="ad-detail-contact"]/a[2]'
llam_press = driver.find_element_by_xpath(llam_button)
llam_press.click()
time.sleep(10)
for item in soup.find_all("div", {"class": "contenido"}):
a.append(item.find("div", {"class": "plaincontenido"}).text)
print(a)
【问题讨论】:
-
使用这个
soup.select_one("script[type='application/ld+json']:contains('Product')").get_text(strip=True)解析相关的脚本标签,然后挖出包含电话号码的description的值。
标签: python selenium web-scraping beautifulsoup