【问题标题】:Pulling reddit comments using python PRAW and creating a dataframe with the results使用 python PRAW 提取 reddit 评论并使用结果创建数据框
【发布时间】:2018-01-21 01:23:44
【问题描述】:

我希望从 reddit 帖子中提取所有 cmets,并最终将作者姓名、评论和赞成票放入数据框中。我对编程还很陌生,所以我过得很艰难..

现在,我正在使用 PRAW 提取粘贴评论,并尝试使用 for 循环遍历 cmets 并创建包含作者和评论的字典列表。出于某种原因,它只是将第一作者评论字典配对添加到列表中并重复它。这是我所拥有的:

import praw
import pandas as pd
import pprint

reddit = praw.Reddit(xxx)
sub = reddit.subreddit('ethtrader')
hot_python = sub.hot(limit=1)



for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        post = {}
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

有什么想法吗?为丑陋的代码道歉,我是新手。谢谢!

【问题讨论】:

    标签: python pandas dataframe reddit praw


    【解决方案1】:
    for submissions in hot_python:
        if submission.stickied:
            print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
            postlist = []                                                 
            submission.comments.replace_more(limit=0)
            for comment in submission.comments: 
                post = {} # put this here
                post['Author'] = comment.author
                post['Comment'] = comment.body
                postlist.append(post)
    

    您应该在for 循环内声明一个新的post dict,因为当您将它附加到列表时,实际上是在附加对post dict 的引用,然后您将相同的dict 更改为新数据,它会针对对该字典的所有引用进行更改。您最后的列表只是对同一字典的引用列表。

    【讨论】:

    • 非常感谢!简单的修复位现在很有意义。欣赏它:D
    猜你喜欢
    • 1970-01-01
    • 2020-05-04
    • 2020-08-16
    • 2020-11-23
    • 1970-01-01
    • 2020-10-17
    • 1970-01-01
    • 2020-05-23
    • 1970-01-01
    相关资源
    最近更新 更多