使用 Python 'for loop' 的结果创建 pandas 数据框答案

【问题标题】：Create pandas dataframe with results from Python 'for loop'使用 Python 'for loop' 的结果创建 pandas 数据框
【发布时间】：2019-09-27 19:16:52
【问题描述】：

Python 新手，尝试从这个 for 循环创建一个简单的 pandas 数据框。循环（1）遍历书的每一章（章节）并按句子标记，然后（2）获取每个句子的极性分数并将每个添加到字典中（'sentiments'），然后（3）得到一个平均值对于每章中的所有句子。输出是一个包含每章 4 个分数的字典。

我需要创建一个包含 28 行（每章 1 个）和 4 列（每个字典中每个分数 1 个）的数据框。完成此任务的最简单方法是什么？

from nltk import tokenize
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 

chapters = [ainulindale,valaquenta,ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9,ch10,ch11,ch12,ch13,ch14,ch15,ch16,ch17,
            ch18,ch19,ch20,ch21,ch22,ch23,ch24,akallabeth,rings]

analyzer = SentimentIntensityAnalyzer()

for chapter in chapters:
    sentence_list = tokenize.sent_tokenize(chapter)
    sentiments = {'compound': 0.0, 'neg': 0.0, 'neu': 0.0, 'pos': 0.0}

    for sentence in sentence_list:
        vs = analyzer.polarity_scores(sentence)
        sentiments['compound'] += vs['compound']
        sentiments['neg'] += vs['neg']
        sentiments['neu'] += vs['neu']
        sentiments['pos'] += vs['pos']

    sentiments['compound'] = sentiments['compound'] / len(sentence_list)
    sentiments['neg'] = sentiments['neg'] / len(sentence_list)
    sentiments['neu'] = sentiments['neu'] / len(sentence_list)
    sentiments['pos'] = sentiments['pos'] / len(sentence_list)

    print(sentiments)

打印语句的输出如下所示：

{'compound': 0.221757281553398, 'neg': 0.041514563106796104, 'neu': 0.8682621359223304, 'pos': 0.09024271844660196}
{'compound': 0.09577214285714292, 'neg': 0.06266428571428569, 'neu': 0.842964285714286, 'pos': 0.09440000000000001}
{'compound': 0.05855809523809526, 'neg': 0.06347619047619049, 'neu': 0.8621809523809518, 'pos': 0.07440000000000001}
{'compound': 0.1280093023255814, 'neg': 0.037604651162790693, 'neu': 0.8903488372093022, 'pos': 0.0720813953488372}
{'compound': -0.008434615384615398, 'neg': 0.07703076923076925, 'neu': 0.8496076923076921, 'pos': 0.07333846153846156}
{'compound': 0.20025294117647055, 'neg': 0.027411764705882358, 'neu': 0.910294117647059, 'pos': 0.06223529411764705}
{'compound': 0.24236, 'neg': 0.020013333333333327, 'neu': 0.9022666666666667, 'pos': 0.07770666666666666}
{'compound': 0.25085555555555544, 'neg': 0.056074074074074075, 'neu': 0.8129444444444446, 'pos': 0.1309814814814815}
{'compound': 0.02056170212765958, 'neg': 0.0704255319148936, 'neu': 0.8526382978723408, 'pos': 0.07694680851063829}
{'compound': -0.13621911764705882, 'neg': 0.09723529411764704, 'neu': 0.8521323529411767, 'pos': 0.05060294117647059}
{'compound': -0.07011322957198443, 'neg': 0.09842801556420237, 'neu': 0.8354124513618679, 'pos': 0.06617898832684826}
{'compound': 0.13921688311688318, 'neg': 0.04997402597402598, 'neu': 0.8669610389610388, 'pos': 0.083012987012987}
{'compound': 0.019619718309859153, 'neg': 0.08153521126760564, 'neu': 0.848169014084507, 'pos': 0.0702394366197183}
{'compound': 0.20739687499999998, 'neg': 0.04675, 'neu': 0.86025, 'pos': 0.09300000000000003}
{'compound': 0.05655333333333335, 'neg': 0.07552000000000003, 'neu': 0.8370933333333335, 'pos': 0.08737333333333329}
{'compound': 0.1834313253012048, 'neg': 0.03204819277108433, 'neu': 0.8945903614457832, 'pos': 0.07337349397590363}
{'compound': -0.058446464646464656, 'neg': 0.0901919191919192, 'neu': 0.8533737373737375, 'pos': 0.056434343434343434}
{'compound': 0.049436129032258073, 'neg': 0.06221935483870969, 'neu': 0.863077419354839, 'pos': 0.07469032258064519}
{'compound': 0.10077664233576646, 'neg': 0.053270072992700715, 'neu': 0.8727883211678833, 'pos': 0.07395620437956206}
{'compound': -0.09540880503144653, 'neg': 0.09535849056603773, 'neu': 0.8386918238993711, 'pos': 0.0659622641509434}
{'compound': -0.058940259740259765, 'neg': 0.08786363636363642, 'neu': 0.844915584415584, 'pos': 0.06720995670995672}
{'compound': -0.09371438356164379, 'neg': 0.09126712328767121, 'neu': 0.8470547945205481, 'pos': 0.06167808219178085}
{'compound': -0.10401964636542241, 'neg': 0.09612770137524558, 'neu': 0.8361139489194496, 'pos': 0.06777799607072695}
{'compound': -0.046306122448979595, 'neg': 0.07844217687074834, 'neu': 0.8614761904761906, 'pos': 0.06008163265306123}
{'compound': 0.05695540540540539, 'neg': 0.06936486486486487, 'neu': 0.8577702702702703, 'pos': 0.07287837837837836}
{'compound': -0.015284375000000006, 'neg': 0.07314843749999998, 'neu': 0.8589296875000001, 'pos': 0.06794531250000001}
{'compound': 0.05184410112359551, 'neg': 0.0851095505617977, 'neu': 0.82794382022472, 'pos': 0.08693258426966298}
{'compound': 0.023425435540069702, 'neg': 0.06889895470383278, 'neu': 0.8573484320557486, 'pos': 0.07374564459930318}

【问题讨论】：

只需创建一个字典列表，然后将其转换为 pandas 数据框。此链接可能会有所帮助：stackoverflow.com/questions/20638006/…

标签： python pandas loops for-loop nltk

【解决方案1】：

通过添加两行代码来创建dicts 的list，如下所示。
从sentiments_list 创建一个dataframe

import pandas as pd

sentiments_list = list()  # add this line

for chapter in chapters:
    sentence_list = tokenize.sent_tokenize(chapter)
    sentiments = {'compound': 0.0, 'neg': 0.0, 'neu': 0.0, 'pos': 0.0}

    for sentence in sentence_list:
        vs = analyzer.polarity_scores(sentence)
        sentiments['compound'] += vs['compound']
        sentiments['neg'] += vs['neg']
        sentiments['neu'] += vs['neu']
        sentiments['pos'] += vs['pos']

    sentiments['compound'] = sentiments['compound'] / len(sentence_list)
    sentiments['neg'] = sentiments['neg'] / len(sentence_list)
    sentiments['neu'] = sentiments['neu'] / len(sentence_list)
    sentiments['pos'] = sentiments['pos'] / len(sentence_list)

    sentiments_list.append(sentiments)  # add this line

df = pd.DataFrame(sentiments_list)  # add this line

`df` 输出：

 compound       neg       neu       pos
 0.221757  0.041515  0.868262  0.090243
 0.095772  0.062664  0.842964  0.094400
 0.058558  0.063476  0.862181  0.074400
 0.128009  0.037605  0.890349  0.072081
-0.008435  0.077031  0.849608  0.073338
 0.200253  0.027412  0.910294  0.062235

【讨论】：

这应该是公认的答案。只是想指出pd.DataFrame.from_dict 也应该有效。如果，无论出于何种原因，你有一个字符串表示而不是一个列表，pd.DataFrame.from_records。

df 输出：

`df` 输出：