【发布时间】:2017-09-20 07:56:13
【问题描述】:
我正在尝试从 Amazon.in 网页收集产品的 ASIN。我有代码可以打开网络驱动程序并搜索产品名称并导航到产品页面的第一页。它能够收集仅第一页的数据,但如何移动到下一页以收集相同的数据。 这是我的代码:
import time
import json
import re
import numpy as np
from bs4 import BeautifulSoup
from selenium import webdriver
import urllib.request
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
import pandas as pd
temp = []
def init_driver():
driver = webdriver.Chrome(executable_path = "C:\\Users\\Desktop\\chromedriver")
driver.wait = WebDriverWait(driver, 10)
return driver
def get_asin(driver):
driver.get("https://www.amazon.in")
print ('Getting the URL')
HTML = driver.page_source
search_button = driver.find_element_by_id("twotabsearchtextbox")
search_button.send_keys("Mobiles")
select_button = driver.find_element_by_class_name("nav-input")
select_button.click()
HTML1=driver.page_source
soup = BeautifulSoup(HTML1, "html.parser")
styles = soup.find_all('li')
#print(styles)
#print(type(styles))
ASIN=[]
for link in styles:
if link.has_attr('data-asin'):
ASIN.append(link['data-asin'])
return(ASIN)
#print(ASIN)
if __name__ == "__main__":
driver = init_driver()
ASIN_NO = get_asin(driver)
#time.sleep(3)
#print ('opening search page')
#for i in range(0,len(ASIN_NO)):
#scrape(driver,ASIN_NO[i])
print (ASIN_NO)
time.sleep(5)
我已经尝试了以下两种显示错误的语法:
select_button = driver.find_element_by_id('pagnNextString')
select_button.click()
日志中的异常:
WebDriverException:消息:未知错误:元素 ... 在点 (778, 606) 处不可点击。 其他元素会收到点击:
select_button = driver.find_element_by_class_name('srSprite pagnNextArrow')
select_button.click()
InvalidSelectorException:消息:无效选择器:复合类 名字不允许
请帮助正确的方法。 提前致谢。
【问题讨论】:
标签: python-3.x selenium-webdriver web-scraping beautifulsoup amazon