TypeError：“NoneType”对象不可下标，webscrapin Python答案

【问题标题】：TypeError: 'NoneType' object is not subscriptable, webscrapin PythonTypeError：“NoneType”对象不可下标，webscrapin Python
【发布时间】：2017-09-17 05:31:25
【问题描述】：

此代码用于从网页中搜索电影并打印搜索结果的第一个标题。

from urllib.request import urlopen
import urllib
from bs4 import BeautifulSoup
import requests
import pprint

def infopelicula(nombrepelicula):
    my_url='http://www.imdb.com/find?ref_=nv_sr_fn&q='+nombrepelicula+'&s=tt'
    rprincipal = requests.get(my_url)
    soup= BeautifulSoup(rprincipal.content, 'html.parser')
    title = soup.findAll("td", class_="result_text")
    for name in title:
        titulo = name.parent.find("a", href=True)
        print (name.text)[0]

它确实有效，但在打印标题时，出现错误。举个例子：

>>>infopelicula("Harry Potter Chamber")
Harry Potter and the Chamber of Secrets (2002) 
Traceback (most recent call last):File "<pyshell#49>", line 1, in <module>
infopelicula("Harry Potter Chamber")
File "xxxx", line 14, in infopelicula print (name.text)[0]
TypeError: 'NoneType' object is not subscriptable

【问题讨论】：

标签： python web-scraping python-3.5

【解决方案1】：

在 Python3.5 中，print 是一个返回 None 的函数，它（正如错误明确指出的那样）不能下标。

也许你的意思是print(name.text[0])？

【讨论】：

我认为name.text 更有意义。 name.text[0] 将打印名称的第一个字母。
我都做了，print(name.text[0]) 什么都不打印，name.text 打印所有的标题，我只是第一个
现在我使用 name.text[1] 并打印每个标题的第一个字母：/
您的预期输出是什么？只有标题吗？

【解决方案2】：

这个怎么样：

import requests
from bs4 import BeautifulSoup

def infopelicula():
    my_url = 'http://www.imdb.com/find?ref_=nv_sr_fn&q="Harry Potter Chamber"&s=tt'
    soup = BeautifulSoup(requests.get(my_url).text, 'lxml')
    for name in soup.find_all("td",class_="result_text"):
        title = name.find_all("a",text=True)[0]
        print (title.text)
infopelicula()

部分输出：

Harry Potter and the Sorcerer's Stone
Harry Potter and the Goblet of Fire
Harry Potter and the Half-Blood Prince
Harry Potter and the Deathly Hallows: Part 2

仅适用于第一个标题：

import requests
from bs4 import BeautifulSoup

def infopelicula():
    my_url = 'http://www.imdb.com/find?ref_=nv_sr_fn&q="Harry Potter Chamber"&s=tt'
    soup = BeautifulSoup(requests.get(my_url).text, 'lxml')
    for name in soup.find_all("td",class_="result_text")[:1]:
        title = name.find_all("a",text=True)[0]
        print (title.text)
infopelicula()

输出：

Harry Potter and the Chamber of Secrets

【讨论】：