【发布时间】:2020-02-24 09:37:00
【问题描述】:
我正在制作电影专家的评论刮板。我用 Jupiter Notebook 编写了这段代码。 我想刮掉这部分。
我试过了
soup.find('div', 'reporter_line')
和
soup.find('dl','p_review')
但它不起作用。
AttributeError: 'NoneType' 对象没有属性 'find_all'
如何修复此代码以抓取此文本?
from bs4 import BeautifulSoup
from urllib.request import urlopen
from urllib.request import urljoin
import pandas as pd
import requests
import re
#url_base = 'https://movie.naver.com/movie/bi/mi/pointWriteFormList.nhn?code=25917&type=after&page=1'
base_url = 'https://movie.naver.com/movie/bi/mi/basic.nhn?code=' #movie title
pages =['158191','47384','52745','91391','179482','38466','141259','182205','56447',
'86023','88426','66025','130903','120157','132998','97693','121051','158112',
'93728','99752','37247','37838','105249','61698','73476','49480','34210',
'74893','113312','122133','35937','114139','134772','88253','37919','45914',
'144314','75413','171755','37262','35938','116532','68435','154449',
'41585','47701','34570','145162','157297','179461','42809','104467','144578','66002',
'142625','137952','86888','64950','180402','164151','134895','52545','130966','129050',
'79557','50932','164173','70276','44456','129051','74522','122984','37929',
'124025','167697','85579','38452','146459','45232','76016','123519','46532','163533',
'146544','174903','63537','25917','108225','164102','136686','93028','63061',
'54411','161984','106522','53158','179158','88295','52548','52498','109906','39379',
'48227','130786','177374','69270','34324','124041','38888','34197','73344',
'125805','118922','81891','35939','31606','67769','130720','136007','34190','99724',
'120165','62727','48742','98149','142803','39715','30791','36019','159805']
df = pd.DataFrame()
for n in pages:
# Create url
url = base_url + n
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
title = soup.find('h3', 'h_movie')
for a in title.find_all('a'):
#print(a.text)
title=a.text
rname = soup.find('div','reporter_line')
for a in rname.find_all(('a')['href']):
#print(a.text)
rname=a.text
rreview = soup.find('p','tx_report')
data = {'title':[title],'rname':[rname], 'rreview':[rreview]}
df.to_csv('./reviewr.csv', sep=',', encoding='utf-8-sig')
【问题讨论】:
-
如果我没记错美丽的汤找到使用以下参数
find('dl', attrs={'class': 'p_review'})docs -
title或rname为无,因此您会得到“AttributeError: 'NoneType' object has no attribute 'find_all''。在调用 find_all 之前,请确保 rname/title 不是 None -
我按照您的建议进行了尝试,但没有成功。没有错误,但没有将 rname 保存在 csv 文件中。
-
我试图打开网址,但没有
dlor 元素与p_review类或reporter_line。
标签: python web-scraping beautifulsoup web-crawler