【问题标题】:Python write to csv file from DictPython 从 Dict 写入 csv 文件
【发布时间】:2020-12-13 20:04:36
【问题描述】:

See excel file SS The data looks as in image in csv file

  1. 这是我到目前为止所写的,用于分析来自 IMDB 的评论。 首先,它从 imdb 网站(前 250 部电影)获取评论。
  2. 然后获取电影链接、评论链接、从评论中提取文本并将其存储在字典数据格式中,movie_name: 电影评论格式。
  3. 在最后一步中,我可以在控制台上打印 Movie_Name:电影评论。但是当我写入 CSV 文件时,它会给出错误或仅将不正确的数据写入 CSV 文件。
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import csv
import requests
import re
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')
from nltk.tokenize import word_tokenize

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}


url = input('Enter - ')
while (True):
    try:
        page = requests.get(url, headers = headers)
        soup = BeautifulSoup(page.content, "html.parser")
        container = soup.find_all('td', class_ = 'titleColumn')
        break
    except:
        print("Please enter a valid url:")
        url = input('Enter - ')

def movies_list():
    movie_names = []
    movies = container[:100] #here we get the top 50 movies we want
    for movie in movies:
        name = movie.find('a').text
        movie_names.append(name)
    return movie_names
#print(movie_names)


def movie_links_list():
    movie_links = []
    movies = container[:100]
    for movie in movies:
        tag = movie.find('a')`enter code here`
        link = tag.get('href', None)
        movie_links.append(link)
    for i in range(len(movie_links)):
            movie_links[i] = 'https://www.imdb.com/'+ movie_links[i]
    return movie_links

def review_link_list(movie_links):
    review_links = []
    for movie_link in movie_links:
        title_pos = movie_link.find('title')
        nxt_slash = movie_link.find('/', title_pos)
        nxt2_slash = movie_link.find('/', nxt_slash+1)
        review_link = movie_link[:title_pos-1] + movie_link[title_pos:nxt2_slash+1] + "reviews?ref_=tt_urv"
        review_links.append(review_link)
    return review_links


def get_reviews(review_links):
    movie_names=movies_list()
    review_dict={}
    for i in range(len(review_links)):
        movie_name=movie_names[i]
        movie_reviews=[]
        review_page = requests.get(review_links[i], headers = headers)
        soup = BeautifulSoup(review_page.content, "html.parser")
        tag = soup.find_all('div', class_ = 'content') #find_all to return a list
        top_50= tag[:50]
        for j in top_50:
            try:
                review=j.select('div.show-more__control')[0].text
            except:
                continue
            movie_reviews.append(review)
        review_dict[movie_name]=movie_reviews
    return review_dict

file= "abc.csv"
with open(file ,'w') as csvfile:
    for i in range(len(movies)):
        csvwriter = csv.writer(csvfile)
        Name=movies[i]
        Review = reviews_dict[Name]
        try:
            csvwriter.writerow(Review)
        except:
            csvwriter.writerow("Review does not exist")

【问题讨论】:

  • 您收到的错误是什么?现在,尝试将这部分 csv.writer(csvfile) 编辑为 csv.writer(csvfile,delimiter=','),看看是否能解决问题。
  • 在那个acse中它给出了TypeError:'delimeter'是这个函数的无效关键字参数

标签: python python-3.x csv jupyter-notebook character-encoding


【解决方案1】:

你需要打开文件并用数据写一个列表

import csv
dict = {"mykey":10}

with open("mydata.csv", 'a') as file:
   writer = csv.writer(file)
   for key, value in dict.items():
     data = [key, value]
     writer.writerow(data)

在 csv 文件“mydata.csv”中你不会得到

mykey,10

在 open 中使用 'a' 作为 args 时,您可以将数据附加到文件中,以免覆盖旧数据

【讨论】:

  • 此方法适用于几个enteries。但在我的数据中 len(dict) 是 100,dict.key = movie_name 和 dict value 是关于该电影的 25 条评论的列表。因此,总共有 100 个列表,大约 100 万字。写入此数据会导致数据丢失和不必要的换行
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-03-15
  • 1970-01-01
  • 2019-11-15
  • 1970-01-01
  • 2013-02-04
  • 1970-01-01
相关资源
最近更新 更多