使用 python PRAW 提取 reddit 评论并使用结果创建数据框答案

【问题标题】：Pulling reddit comments using python PRAW and creating a dataframe with the results使用 python PRAW 提取 reddit 评论并使用结果创建数据框
【发布时间】：2018-01-21 01:23:44
【问题描述】：

我希望从 reddit 帖子中提取所有 cmets，并最终将作者姓名、评论和赞成票放入数据框中。我对编程还很陌生，所以我过得很艰难..

现在，我正在使用 PRAW 提取粘贴评论，并尝试使用 for 循环遍历 cmets 并创建包含作者和评论的字典列表。出于某种原因，它只是将第一作者评论字典配对添加到列表中并重复它。这是我所拥有的：

import praw
import pandas as pd
import pprint

reddit = praw.Reddit(xxx)
sub = reddit.subreddit('ethtrader')
hot_python = sub.hot(limit=1)



for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        post = {}
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

有什么想法吗？为丑陋的代码道歉，我是新手。谢谢！

【问题讨论】：

标签： python pandas dataframe reddit praw

【解决方案1】：

for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post = {} # put this here
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

您应该在for 循环内声明一个新的post dict，因为当您将它附加到列表时，实际上是在附加对post dict 的引用，然后您将相同的dict 更改为新数据，它会针对对该字典的所有引用进行更改。您最后的列表只是对同一字典的引用列表。

【讨论】：

非常感谢！简单的修复位现在很有意义。欣赏它：D