【发布时间】:2020-12-13 20:04:36
【问题描述】:
See excel file SS The data looks as in image in csv file
- 这是我到目前为止所写的,用于分析来自 IMDB 的评论。 首先,它从 imdb 网站(前 250 部电影)获取评论。
- 然后获取电影链接、评论链接、从评论中提取文本并将其存储在字典数据格式中,movie_name: 电影评论格式。
- 在最后一步中,我可以在控制台上打印 Movie_Name:电影评论。但是当我写入 CSV 文件时,它会给出错误或仅将不正确的数据写入 CSV 文件。
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import csv
import requests
import re
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')
from nltk.tokenize import word_tokenize
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
url = input('Enter - ')
while (True):
try:
page = requests.get(url, headers = headers)
soup = BeautifulSoup(page.content, "html.parser")
container = soup.find_all('td', class_ = 'titleColumn')
break
except:
print("Please enter a valid url:")
url = input('Enter - ')
def movies_list():
movie_names = []
movies = container[:100] #here we get the top 50 movies we want
for movie in movies:
name = movie.find('a').text
movie_names.append(name)
return movie_names
#print(movie_names)
def movie_links_list():
movie_links = []
movies = container[:100]
for movie in movies:
tag = movie.find('a')`enter code here`
link = tag.get('href', None)
movie_links.append(link)
for i in range(len(movie_links)):
movie_links[i] = 'https://www.imdb.com/'+ movie_links[i]
return movie_links
def review_link_list(movie_links):
review_links = []
for movie_link in movie_links:
title_pos = movie_link.find('title')
nxt_slash = movie_link.find('/', title_pos)
nxt2_slash = movie_link.find('/', nxt_slash+1)
review_link = movie_link[:title_pos-1] + movie_link[title_pos:nxt2_slash+1] + "reviews?ref_=tt_urv"
review_links.append(review_link)
return review_links
def get_reviews(review_links):
movie_names=movies_list()
review_dict={}
for i in range(len(review_links)):
movie_name=movie_names[i]
movie_reviews=[]
review_page = requests.get(review_links[i], headers = headers)
soup = BeautifulSoup(review_page.content, "html.parser")
tag = soup.find_all('div', class_ = 'content') #find_all to return a list
top_50= tag[:50]
for j in top_50:
try:
review=j.select('div.show-more__control')[0].text
except:
continue
movie_reviews.append(review)
review_dict[movie_name]=movie_reviews
return review_dict
file= "abc.csv"
with open(file ,'w') as csvfile:
for i in range(len(movies)):
csvwriter = csv.writer(csvfile)
Name=movies[i]
Review = reviews_dict[Name]
try:
csvwriter.writerow(Review)
except:
csvwriter.writerow("Review does not exist")
【问题讨论】:
-
您收到的错误是什么?现在,尝试将这部分
csv.writer(csvfile)编辑为csv.writer(csvfile,delimiter=','),看看是否能解决问题。 -
在那个acse中它给出了TypeError:'delimeter'是这个函数的无效关键字参数
标签: python python-3.x csv jupyter-notebook character-encoding