【问题标题】:NoneType object has no attribute find_all error using beautiful SoupNoneType 对象没有属性 find_all 错误使用美丽的汤
【发布时间】:2020-11-09 16:35:29
【问题描述】:

我正在尝试阅读以下内容:

我的目标是阅读此页面上的每个职位 - https://www.cvbankas.lt/?miestas=Vilnius&padalinys%5B%5D=&keyw=python

我尝试过的:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.cvbankas.lt/?miestas=Vilnius&padalinys%5B%5D=&keyw=python'
page = requests.get(URL).text
soup = BeautifulSoup(page, 'html.parser')
results = soup.find(id='ResultsContainer')
 
# Look for Python jobs
python_jobs = results.find_all("div", string=lambda t: "python" in t.lower())

for p_job in python_jobs:
    link = p_job.find("h3")["href"]
    print(p_job.text.strip())
    print(f"Apply here: {link}\n")

但我收到以下错误:

AttributeError: 'NoneType' 对象没有属性 'find_all'

如何阅读所有标题?

【问题讨论】:

  • 你确定那里有一个id为'ResultsContainer'的元素吗?
  • 当我在您指定的 URL 上打开 devtools 时,我什至找不到 ID ResultsContainer - 这是在您的代码中
  • 应该是'main_container' id。
  • 我也不确定div 标签名称是results.find_all("div", string=lambda t: "python" in t.lower()) 的正确选择。一个正确的选择可能是article

标签: python beautifulsoup


【解决方案1】:

问题是,没有任何带有id="ResultsContainer" 的标签。您可以使用文本 Python 搜索所有 <h3> 标签,然后找到父 <a> 标签作为 url:

import requests
from bs4 import BeautifulSoup


URL = 'https://www.cvbankas.lt/?miestas=Vilnius&padalinys%5B%5D=&keyw=python'
page = requests.get(URL).text
soup = BeautifulSoup(page, 'html.parser')

results = soup.find_all('h3', text=lambda t: 'python' in t.lower())
for r in results:
    print(r.text)
    print(r.find_parent('a')['href'])
    print('-' * 80)

打印:

Senior Python Developer
https://www.cvbankas.lt/senior-python-developer-vilniuje/1-6719819
--------------------------------------------------------------------------------
Full Stack Engineer (React + Python)
https://www.cvbankas.lt/full-stack-engineer-react-python-vilniuje/1-6665723
--------------------------------------------------------------------------------
Python programuotojas (Mid-Senior)
https://www.cvbankas.lt/python-programuotojas-mid-senior-vilniuje/1-6693547
--------------------------------------------------------------------------------
Python Developer
https://www.cvbankas.lt/python-developer-vilniuje/1-6604883
--------------------------------------------------------------------------------

【讨论】:

    【解决方案2】:

    您的问题是没有元素具有 id "ResultsContainer"

    但是参考页面的结构,你可以使用css selector直接获取所有信息:

    import requests
    from bs4 import BeautifulSoup
    
    URL = 'https://www.cvbankas.lt/?miestas=Vilnius&padalinys%5B%5D=&keyw=python'
    page = requests.get(URL).text
    soup = BeautifulSoup(page, 'html.parser')
    results = soup.select("div.list_cell > .list_h3")
    for i in results:
        print(i.text)
    

    结果:

    Data Engineer
    Data Analyst
    VYRESNYSIS INŽINIERIUS STRATEGIJOS IR TYRIMŲ SKYRIUJE
    Senior Python Developer
    Full Stack Engineer (React + Python)
    DevOps Engineer
    Linux Systems Automation Engineer
    Big Data Developer
    Big Data Devops Engineer
    Python programuotojas (Mid-Senior)
    DATA SCIENTIST
    DEVOPS INŽINIERIAUS (e-commerce platformos produktų optimizavimas užsienio rinkoms)
    LINUX Sistemų administratorius (-ė)
    QA engineer
    Blockchain Developer
    Backend Software Engineer
    FW/HW Quality Assurance Engineer
    Software developer in Test
    Python Developer
    Senior Backend Engineer
    

    【讨论】:

      【解决方案3】:

      查看我的代码:

      import requests
      from bs4 import BeautifulSoup
      URL = 'https://www.cvbankas.lt/?miestas=Vilnius&padalinys%5B%5D=&keyw=python'
      page = requests.get(URL).text
      soup = BeautifulSoup(page, 'html.parser')
      h3_tags = soup.findAll("h3", {"class": "list_h3"})
      for x in h3_tags:
          if "Python" in x.text:
              print(x.text)
              print(x.find_parent('a')['href'])
              print()
      
      

      输出是:

      Senior Python Developer
      https://www.cvbankas.lt/senior-python-developer-vilniuje/1-6719819
      
      Full Stack Engineer (React + Python)
      https://www.cvbankas.lt/full-stack-engineer-react-python-vilniuje/1-6665723
      
      Python programuotojas (Mid-Senior)
      https://www.cvbankas.lt/python-programuotojas-mid-senior-vilniuje/1-6693547
      
      Python Developer
      https://www.cvbankas.lt/python-developer-vilniuje/1-6604883
      

      【讨论】:

        【解决方案4】:

        用下面的代码替换你的代码:

        import requests
        from lxml import etree
        from bs4 import BeautifulSoup
        
        URL = 'https://www.cvbankas.lt/?miestas=Vilnius&padalinys%5B%5D=&keyw=python'
        page = requests.get(URL).text
        soup = BeautifulSoup(page, 'html.parser')
        
        dom = etree.HTML(str(soup))
        elements = dom.xpath('//h3[@class="list_h3"]')
        for element in elements:
          print(element.text)
        

        【讨论】:

          【解决方案5】:

          这个方法:soup.find(id='ResultsContainer')没有找到符合条件的元素,因此返回None

          在这一行:python_jobs = results.find_all("div", string=lambda t: "python" in t.lower())results 的值为None

          None.find_all 不存在。 (AttributeError: 'NoneType' object has no attribute 'find_all')

          【讨论】:

          • 如果您没有提供答案,请使用评论部分的建议。
          猜你喜欢
          • 1970-01-01
          • 2014-07-29
          • 1970-01-01
          • 2018-07-19
          • 2019-04-06
          • 1970-01-01
          • 1970-01-01
          • 2022-01-12
          • 2021-02-21
          相关资源
          最近更新 更多