【问题标题】:Why do I get an error when trying to access the first two columns in an HTML table?为什么在尝试访问 HTML 表中的前两列时会出现错误?
【发布时间】:2025-11-25 18:50:01
【问题描述】:
import requests
from bs4 import BeautifulSoup

wiki = "https://en.wikipedia.org/wiki/List_of_Pixar_films"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url, 'lxml')
table_class = "wikitable plainrowheaders sortable"
my_table = soup.find('table', {'class': table_class})


Film = []
release = []

for row in my_table.find_all('i')[0:]:
    Film_cell = row.find_all('a')[0]
    Film.append(Film_cell.text)
print(Film)

for row in my_table.find_all('td')[0:]:
    release = row.find_all('span')[:1]
    release.append(release.text)
print(release)

输出:

['Toy Story', "A Bug's Life", 'Toy Story 2', 'Monsters, Inc.',
'Finding Nemo', 'The Incredibles', 'Cars', 'Ratatouille', 'WALL-E',
'Up', 'Toy Story 3', 'Cars 2', 'Brave', 'Monsters University', 'Inside Out',
'The Good Dinosaur', 'Finding Dory', 'Cars 3', 'Coco', 'Incredibles 2',
'Toy Story 4', 'Onward', 'Soul', 'Luca', 'Turning Red', 'Lightyear']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-223-6481bc092354> in <module>
      7 for row in my_table.find_all('td')[0:]:
      8     release = row.find_all('span')[:1]
----> 9     release.append(release.text)
     10 print(release)

AttributeError: 'list' object has no attribute 'text'

【问题讨论】:

    标签: python list web-scraping beautifulsoup


    【解决方案1】:
    for row in my_table.find_all('td')[0:]:
        release= row.find_all('span')[:1]
        release.append(release.text)
    print(release)
    
    • my_table.find_all('td')[0:]my_table.find_all('td') 相同
    • row.find_all('span')[:1] 是一个列表,可能是 row.find_all('span')[0]
    • release= row.find_all('span')[:1],应该使用另一个变量

    获取前两列,不包括索引列。

    release = []
    for row in my_table.find_all('td'):
        span = row.find_all('span')
        if span:
            release.append(span[0].text)
    print(release)
    
    [('Toy Story', 'November 22, 1995'), ("A Bug's Life", 'November 25, 1998'), ('Toy Story 2', 'November 24, 1999'), ('Monsters, Inc.', 'November 2, 2001'), ('Finding Nemo', 'May 30, 2003'), ('The Incredibles', 'November 5, 2004'), ('Cars', 'June 9, 2006'), ('Ratatouille', 'June 29, 2007'), ('WALL-E', 'June 27, 2008'), ('Up', 'May 29, 2009'), ('Toy Story 3', 'June 18, 2010'), ('Cars 2', 'June 24, 2011'), ('Brave', 'June 22, 2012'), ('Monsters University', 'June 21, 2013'), ('Inside Out', 'June 19, 2015'), ('The Good Dinosaur', 'November 25, 2015'), ('Finding Dory', 'June 17, 2016'), ('Cars 3', 'June 16, 2017'), ('Coco', 'November 22, 2017'), ('Incredibles 2', 'June 15, 2018'), ('Toy Story 4', 'June 21, 2019'), ('Onward', 'March 6, 2020'), ('Soul', 'December 25, 2020'), ('Luca', 'June 18, 2021'), ('Turning Red[1]', 'March 11, 2022[5]'), ('Lightyear[2]', 'June 17, 2022[5]'), ('TBA', 'June 16, 2023[8]'), ('TBA', 'March 1, 2024[4]'), ('TBA', 'June 14, 2024[4]')]
    

    【讨论】:

      【解决方案2】:

      代码release= row.find_all('span')[:1] 生成一个没有“文本”参数的列表。您需要进一步解析它以获得“文本”元素,即release.append(release[0].text) 而不是release.append(release.text)

      但这也会产生“索引超出范围错误”,因为您的循环中有许多列表是空的。

      修改代码如下:

      import requests
      from bs4 import BeautifulSoup
      wiki = "https://en.wikipedia.org/wiki/List_of_Pixar_films"
      website_url = requests.get(wiki).text
      soup = BeautifulSoup(website_url,'lxml')
      table_class = "wikitable plainrowheaders sortable"
      my_table = soup.find('table',{'class':table_class})
      
      
      Film = []
      release = []
      
      for row in my_table.find_all('i')[0:]:
          Film_cell = row.find_all('a')[0]
          Film.append(Film_cell.text)
      print(Film)
      
      new_list = []
      for row in my_table.find_all('td')[0:]:
          release= row.find_all('span')[:1]    
          if len(release) > 0:
              new_list.append(release[0].text)
          print(new_list)
      

      【讨论】:

        最近更新 更多