【问题标题】:BeautifulSoup returning a TypeError: object of 'NoneType' has no len()BeautifulSoup 返回 TypeError:“NoneType”的对象没有 len()
【发布时间】:2020-11-06 10:25:49
【问题描述】:

我正在使用 BeautifulSoup 抓取数据并向我返回列表中所有 div 的列表,但它给了我这个错误:

Traceback (most recent call last):
  File "C:\Users\intel\Desktop\One page\test.py", line 16, in <module>
    soup = BeautifulSoup(div.html,'html5lib')
  File "C:\Users\intel\AppData\Local\Programs\Python\Python38\lib\site-packages\bs4\__init__.py", line 287, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()

这是我的代码:

from bs4 import BeautifulSoup
import requests as req

resp = req.get('https://medium.com/@daranept27')

x = resp.text

soup = BeautifulSoup(x, "lxml")
 
divs = soup.find_all("div")
#print(divs)

lst = []

for div in divs:
    soup = BeautifulSoup(div.html,'html5lib')
    div_tag = soup.find()
    try:
        title = div_tag.section.div.h1.a['href']
        if title not in lst: lst.append(title)
    except:
        pass

print("\n".join(lst))

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup python-requests


    【解决方案1】:

    尝试使用str(div)div 转换为str。完整代码如下:

    from bs4 import BeautifulSoup
    import requests as req
    
    resp = req.get('https://medium.com/@daranept27')
    
    x = resp.text
    
    soup = BeautifulSoup(x, "lxml")
    
    divs = soup.find_all("div")
    # print(divs)
    
    lst = []
    
    for div in divs:
        soup = BeautifulSoup(str(div), 'html5lib')
        div_tag = soup.find()
        try:
            title = div_tag.section.div.h1.a['href']
            if title not in lst: lst.append(title)
        except:
            pass
    
    print("\n".join(lst))
    

    输出:

    /read-rosy/if-the-whole-world-is-compelled-to-forget-everything-cde200c0ad98
    /wordsmith-library/seven-days-between-life-and-death-dffb639fb245
    /an-idea/have-you-ever-encountered-a-fake-friend-if-so-try-these-simple-tips-to-overcome-it-d8473d755ab8
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-03-03
      • 2015-07-30
      • 1970-01-01
      • 2018-08-14
      • 2016-06-06
      • 1970-01-01
      • 1970-01-01
      • 2017-08-07
      相关资源
      最近更新 更多