【发布时间】:2020-05-25 19:25:57
【问题描述】:
我正在尝试抓取人们的公开资料,以获取某些角色的最常用技能。我能够提取电子邮件、公司、姓名、职位等,但我无法获得技能。 我正在使用解析器中的选择器。我尝试了很多方法,但显然我的目标是错误的课程,我可能应该循环学习技能。到目前为止,这是我的代码:
def linkedin_scrape(linkedin_urls):
profiles = []
for url in linkedin_urls:
_DRIVER_CHROME.get(url)
sleep(5)
selector = Selector(text=_DRIVER_CHROME.page_source)
# Use xpath to extract the exact class containing the profile name
name = selector.xpath('//*[starts-with(@class, "inline")]/text()').extract_first()
if name:
name = name.strip()
# Use xpath to extract the exact class containing the profile position
position = selector.xpath('//*[starts-with(@class, "mt1")]/text()').extract_first()
if position:
position = position.strip()
position = position[0:position.find(' at ')]
# Use xpath to extract the exact class containing the profile company
company = selector.xpath('//*[starts-with(@class, "text-align-left")]/text()').extract_first()
if company:
company = company.strip()
# Use xpath to extract skills
skills = selector.xpath('//*[starts-with(@class, "pv-skill")]/text()').extract_first()
if skills:
skills = skills.strip()
profiles.append([name, position, company, url])
print(f'{len(profiles)}: {name}, {position}, {company}, {url}, {skills}')
return profiles
【问题讨论】:
-
你能分享一个HTML技能示例吗?
标签: python selenium webdriver parsel