【发布时间】:2021-08-14 23:32:19
【问题描述】:
我尝试抓取页面上的所有作业,但没有成功。我一直在尝试不同的方法,但我没有成功。打开并抓取第一个作品后,脚本会崩溃。我不知道接下来我应该做什么才能继续下一份工作。有没有人帮我让它工作?先感谢您。 我不得不缩短代码,因为它不允许我全部发布(代码太多)。
# Part 1
from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
df = pd.DataFrame(columns=["Title","Description",'Job-type','Skills'])
for i in range(25):
driver.get('https://www.reed.co.uk/jobs/care-jobs?pageno='+ str(i))
jobs = []
driver.implicitly_wait(20)
for job in driver.find_elements_by_xpath('//*[@id="content"]/div[1]/div[3]'):
soup = BeautifulSoup(job.get_attribute('innerHTML'),'html.parser')
element = WebDriverWait(driver, 50).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "#onetrust-accept-btn-handler")))
element.click()
try:
title = soup.find("h3",class_="title").text.replace("\n","").strip()
print(title)
except:
title = 'None'
sum_div = job.find_element_by_css_selector('#jobSection42826858 > div.row > div > header > h3 > a')
sum_div.click()
driver.implicitly_wait(2)
try:
job_desc = driver.find_element_by_css_selector('#content > div > div.col-xs-12.col-sm-12.col-md-12 > article > div > div.branded-job-details--container > div.branded-job--content > div.branded-job--description-container > div').text
#print(job_desc)
except:
job_desc = 'None'
try:
job_type = driver.find_element_by_xpath('//*[@id="content"]/div/div[2]/article/div/div[2]/div[3]/div[2]/div/div/div[3]/div[3]/span').text
#print(job_type)
except:
job_type = 'None'
try:
job_skills = driver.find_element_by_xpath('//*[@id="content"]/div/div[2]/article/div/div[2]/div[3]/div[6]/div[2]/ul').text
#print(job_skills)
except:
job_skills = 'None'
driver.back()
driver.implicitly_wait(2)
df = df.append({'Title':title,"Description":job_desc,'Job-type':job_type,'Skills':job_skills},ignore_index=True)
df.to_csv(r"C:\Users\Desktop\Python\newreed.csv",index=False)
【问题讨论】:
-
为什么是
driver.back()?这真的需要吗?乍一看似乎是多余的。有调试信息吗? -
我只是插入了备份驱动程序让我回到主页,有或没有驱动程序返回都是同样的问题。
标签: javascript python selenium web-scraping beautifulsoup