【问题标题】:AttributeError: 'list' object has no attribute 'h3' ( Beautifulsoup )AttributeError:“list”对象没有属性“h3”(Beautifulsoup)
【发布时间】:2019-10-17 02:37:31
【问题描述】:

我是网络抓取的初学者,我正在按照本教程 (https://www.dataquest.io/blog/web-scraping-beautifulsoup/) 来提取电影数据,我认为我对“first_movie”的定义很糟糕!

这是代码

  from requests import get
  from bs4 import BeautifulSoup

  first_movie =[]

  url = 'http://www.imdb.com/search/title? 
  release_date=2017&sort=num_votes,desc&page=1'
  response = get(url)
  html_soup = BeautifulSoup(response.text, 'html.parser')
  type(html_soup)

  movie_containers = html_soup.find_all('div', class_ = 'lister-item mode-advanced')

  first_name = first_movie.h3.a.text

我收到此错误:

Traceback (most recent call last):
File "mov1.py", line 13, in <module>
first_name = first_movie.h3.a.text
AttributeError: 'list' object has no attribute 'h3'

【问题讨论】:

  • 你想用 h3 做什么?
  • @Jeppe 那行不通,因为first_movie 没有元素,它是一个空列表。
  • @MatiasCicero 对不起,我看错了。 html_soup.find_all 返回一个列表。这些可能中的每一个都包含一个h3。例如。 movie_containers[0].h3.a.textSee documentation

标签: python html web-scraping beautifulsoup


【解决方案1】:

一个不错的短选择器,利用相邻的兄弟组合器在类旁边获取a 标记

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1')
soup = bs(r.content, 'lxml')
titles = [item.text for item in soup.select('.lister-item-index + a')]
print(titles)

【讨论】:

    【解决方案2】:

    first_movie 未分配,将movie_containers 替换为它。使用find() 选择第一个元素

    first_movie = html_soup.find('div', class_ = 'lister-item mode-advanced')
    first_name = first_movie.h3.a.text
    

    或将find_all() 与索引一起使用

    first_movie = html_soup.find_all('div', class_ = 'lister-item mode-advanced')[0]
    first_name = first_movie.h3.a.text
    

    【讨论】:

      【解决方案3】:

      试试下面的代码。

      import requests
      from bs4 import BeautifulSoup
      url = 'https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1'
      r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
      soup = BeautifulSoup(r.content, 'html.parser')
      items=soup.find_all('h3',class_='lister-item-header')
      for item in items:
          print(item.find('a').text)
      

      输出:

      Logan
      Wonder Woman
      Guardians of the Galaxy: Vol. 2
      Thor: Ragnarok
      Dunkirk
      Star Wars: Episode VIII - The Last Jedi
      Spider-Man: Homecoming
      Get Out
      Blade Runner 2049
      Baby Driver
      It
      Three Billboards Outside Ebbing, Missouri
      Justice League
      The Shape of Water
      John Wick: Chapter 2
      Coco
      Jumanji: Welcome to the Jungle
      Beauty and the Beast
      Kong: Skull Island
      Kingsman: The Golden Circle
      Pirates of the Caribbean: Salazar's Revenge
      Alien: Covenant
      13 Reasons Why
      War for the Planet of the Apes
      The Greatest Showman
      Life
      Fast & Furious 8
      Murder on the Orient Express
      Lady Bird
      Ghost in the Shell
      King Arthur: Legend of the Sword
      Wind River
      The Hitman's Bodyguard
      Mother!
      The Mummy
      Call Me by Your Name
      Atomic Blonde
      The Punisher
      Bright
      I, Tonya
      Valerian and the City of a Thousand Planets
      Baywatch
      Darkest Hour
      American Made
      La Casa de Papel
      Mindhunter
      Transformers: The Last Knight
      The Handmaid's Tale
      The Lego Batman Movie
      The Disaster Artist
      

      【讨论】:

        【解决方案4】:

        find_all 总是返回一个列表。

        替换你的代码:

        first_name = first_movie.h3.a.text
        

        for movie in movie_containers:
          print(movie.find("h3").find("a").text)
        

        O/P:

        Valerian and the City of a Thousand Planets
        Baywatch
        Darkest Hour
        American Made
        La Casa de Papel
        Mindhunter
        Transformers: The Last Knight
        The Handmaid's Tale
        The Lego Batman Movie
        The Disaster Artist
        

        【讨论】:

          猜你喜欢
          • 2023-04-09
          • 2014-09-08
          • 2016-01-30
          • 2020-12-15
          • 2016-04-15
          • 2014-04-15
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多