【发布时间】:2021-04-07 18:10:04
【问题描述】:
我是 Python 和一般编码的菜鸟,所以请对我进行简单的回复,我不懂很多术语,所以请用简单的话回答我啊哈。
我正在尝试使用已在其他网站上成功使用的代码抓取网站,但现在它不适用于该网站。它说“NoneType”对象没有属性“absolute_links”,但我不知道为什么。我尝试过使用多个不同的类、部分的“jobs”字符串,我相信它们是正确的,因为它包含我需要的 a/hrefs。谁能告诉我哪里出错以及如何纠正? 这是错误代码
for item in jobs.absolute_links:
AttributeError: 'NoneType' object has no attribute 'absolute_links'
这是我的代码,我已经删除了大部分类别列表,所以它没有那么长。
from requests_html import HTMLSession
import re
import pandas as pd
url = 'https://jobs.zalando.com/en/jobs/1621977-maintenance-shift-leader-in-intralogistics/?gh_src=22377bdd1us'
departmentcategories = {
"android": "Software Development",
"social media": "Marketing",
"content ": "Marketing",
"sales": "Sales",
"ecommerce": "Ecommerce",
}
languagecategories = {
" and ": "English",
" und ": "German",
" et ": "French",
" y ": "Spanish",
" e ": "Spanish",
"German": "German",
"Italian": "Italian",
"French": "French",
"Spanish": "Spanish",
"Dutch": "Dutch",
}
experiencecategories = {
"senior": "Mid Senior Level",
"Junior": "Entry Level",
"VP ": "Executive",
"Director": "Director",
"Head of ": "Mid Senior Level",
}
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)
jobs = r.html.find('ul.cards-container', first=True)
#Section for reviewing department, language, categories
def get_department_categories(department):
depcats = []
for k, v in departmentcategories.items():
if re.search(k, department, re.IGNORECASE):
depcats.append(v)
return depcats
def get_language_categories(language):
langcats = []
for k, v in languagecategories.items():
if re.search(k, language, re.IGNORECASE):
langcats.append(v)
return langcats
def get_experience_categories(experience):
expcats = []
for k, v in experiencecategories.items():
if re.search(k, experience, re.IGNORECASE):
expcats.append(v)
return expcats
#Section for job title, city, and country
jobtitles=[]
cities=[]
countries=[]
departments=[]
experiencelevels=[]
jobpostlinks=[]
languages=[]
urllinks=[]
for item in jobs.absolute_links:
r = s.get(item)
urllinks.append(item)
job_title = r.html.xpath('//*[@id="root"]/div/div[2]/div[3]/div[1]/div[1]/h1', first=True).text
jobtitles.append(job_title)
city = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[1]', first=True).text
cities.append(city)
country = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[1]', first=True).text
if country == ('Berlin, Germany'):
country = 'Germany'
countries.append(country)
#Section for the department, languages, and experience level
#Deparment section and job title
department = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[5]', first=True).text and r.html.xpath('//*[@id="root"]/div/div[2]/div[3]/div[1]/div[1]/h1', first=True).text
department_cats = get_department_categories(department)
departments.append(department_cats)
#What we're looking for section
language = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[1]/div/ul[2]', first=True).text
language_cats = get_language_categories(language)
languages.append(language_cats)
#experience section
experience = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[4]', first=True).text and r.html.xpath('//*[@id="job"]/div[1]/div[1]/div[1]/h1', first=True).text
experience_cats = get_experience_categories(experience)
experiencelevels.append(experience_cats)
print("-"*10)
print(job_title, city, country, "Zalando", ", ".join(department_cats), ", ".join(experience_cats), ", ".join(language_cats), "Fashion", item)
df = pd.DataFrame({'Job Title':jobtitles, 'City':cities, 'Country':countries, 'Department Tags':departments, 'Language Tags':languages, 'Experience Tags':experiencelevels, 'Link':urllinks})
df.to_csv("zalando.csv", encoding='utf-8')
【问题讨论】:
-
哪一行代码给你这个错误?请发布完整的错误代码
-
jobs.absolute_links 中的项目:AttributeError: 'NoneType' object has no attribute 'absolute_links' 在描述中也更新了。
标签: pandas web-scraping attributeerror python-requests-html