【问题标题】:How to extract the telephone number from the webpage?如何从网页中提取电话号码?
【发布时间】:2019-08-13 18:05:33
【问题描述】:

我想抢一些电话号码。从一个网站,我不知道为什么我总是收到错误的信息。

我正在使用 jupyter 运行代码

from selenium import webdriver

url = 'https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html'
tel = []

# Setup webdriver
driver = webdriver.Chrome('.\\chromedriver.exe')
driver.set_page_load_timeout(10)
driver.get(url)

driver.execute_script("window.scrollTo(0, 720)") 
button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/h3')[0]
# //*[@id="content"]/aside/div/div[1]/h3
button.click()
if len(driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/a'))!=0:      
    button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/a')[0]
elif len(driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[7]/ul/li/div[1]/a'))!=0:
    button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[7]/ul/li/div[1]/a')[0]
elif len(driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div[1]/a'))!=0[0]:
    button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div[1]/a')[0]
button.click()


print(driver.find_element_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/div').get_attribute('innerHTML'))
driver.find_element_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/div').get_attribute('innerHTML')

上面是我正在运行的代码,我希望得到'+33 141 57 22 81',但实际输出是'\n\t\t\t\t\t\t\t'

print(driver.find_element_by_class_name('team-sh-tel').get_attribute('innerHTML'))

但是,当我在 jupyter 的另一个单元格中单独运行代码时,它可以打印出所需的电话号码。

【问题讨论】:

    标签: python selenium xpath css-selectors webdriverwait


    【解决方案1】:

    要从网站https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html 中提取电话号码,您需要为element_to_be_clickable() 诱导WebDriverWait,您可以使用以下Locator Strategies: p>

    • 使用CSS_SELECTOR的代码块:

      driver.get("https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html")
      driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.click-tel.icon.icon-telephone"))))
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.click-tel.icon.icon-telephone"))).click()
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CLASS_NAME, "info-tel-num"))).get_attribute("innerHTML"))
      
    • 使用XPATH的代码块:

      driver.get("https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html")
      driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='click-tel icon icon-telephone']//a[text()='See phone number']"))))
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='click-tel icon icon-telephone']//a[text()='See phone number']"))).click()
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='click-tel icon icon-telephone']//a[text()='See phone number']//following::div[1]"))).get_attribute("innerHTML"))
      
    • 控制台输出:

      +33 141 57 22 81
      

    【讨论】:

      猜你喜欢
      • 2018-06-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-05-22
      相关资源
      最近更新 更多