【发布时间】:2019-09-27 19:16:52
【问题描述】:
Python 新手,尝试从这个 for 循环创建一个简单的 pandas 数据框。循环(1)遍历书的每一章(章节)并按句子标记,然后(2)获取每个句子的极性分数并将每个添加到字典中('sentiments'),然后(3)得到一个平均值对于每章中的所有句子。输出是一个包含每章 4 个分数的字典。
我需要创建一个包含 28 行(每章 1 个)和 4 列(每个字典中每个分数 1 个)的数据框。完成此任务的最简单方法是什么?
from nltk import tokenize
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
chapters = [ainulindale,valaquenta,ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9,ch10,ch11,ch12,ch13,ch14,ch15,ch16,ch17,
ch18,ch19,ch20,ch21,ch22,ch23,ch24,akallabeth,rings]
analyzer = SentimentIntensityAnalyzer()
for chapter in chapters:
sentence_list = tokenize.sent_tokenize(chapter)
sentiments = {'compound': 0.0, 'neg': 0.0, 'neu': 0.0, 'pos': 0.0}
for sentence in sentence_list:
vs = analyzer.polarity_scores(sentence)
sentiments['compound'] += vs['compound']
sentiments['neg'] += vs['neg']
sentiments['neu'] += vs['neu']
sentiments['pos'] += vs['pos']
sentiments['compound'] = sentiments['compound'] / len(sentence_list)
sentiments['neg'] = sentiments['neg'] / len(sentence_list)
sentiments['neu'] = sentiments['neu'] / len(sentence_list)
sentiments['pos'] = sentiments['pos'] / len(sentence_list)
print(sentiments)
打印语句的输出如下所示:
{'compound': 0.221757281553398, 'neg': 0.041514563106796104, 'neu': 0.8682621359223304, 'pos': 0.09024271844660196}
{'compound': 0.09577214285714292, 'neg': 0.06266428571428569, 'neu': 0.842964285714286, 'pos': 0.09440000000000001}
{'compound': 0.05855809523809526, 'neg': 0.06347619047619049, 'neu': 0.8621809523809518, 'pos': 0.07440000000000001}
{'compound': 0.1280093023255814, 'neg': 0.037604651162790693, 'neu': 0.8903488372093022, 'pos': 0.0720813953488372}
{'compound': -0.008434615384615398, 'neg': 0.07703076923076925, 'neu': 0.8496076923076921, 'pos': 0.07333846153846156}
{'compound': 0.20025294117647055, 'neg': 0.027411764705882358, 'neu': 0.910294117647059, 'pos': 0.06223529411764705}
{'compound': 0.24236, 'neg': 0.020013333333333327, 'neu': 0.9022666666666667, 'pos': 0.07770666666666666}
{'compound': 0.25085555555555544, 'neg': 0.056074074074074075, 'neu': 0.8129444444444446, 'pos': 0.1309814814814815}
{'compound': 0.02056170212765958, 'neg': 0.0704255319148936, 'neu': 0.8526382978723408, 'pos': 0.07694680851063829}
{'compound': -0.13621911764705882, 'neg': 0.09723529411764704, 'neu': 0.8521323529411767, 'pos': 0.05060294117647059}
{'compound': -0.07011322957198443, 'neg': 0.09842801556420237, 'neu': 0.8354124513618679, 'pos': 0.06617898832684826}
{'compound': 0.13921688311688318, 'neg': 0.04997402597402598, 'neu': 0.8669610389610388, 'pos': 0.083012987012987}
{'compound': 0.019619718309859153, 'neg': 0.08153521126760564, 'neu': 0.848169014084507, 'pos': 0.0702394366197183}
{'compound': 0.20739687499999998, 'neg': 0.04675, 'neu': 0.86025, 'pos': 0.09300000000000003}
{'compound': 0.05655333333333335, 'neg': 0.07552000000000003, 'neu': 0.8370933333333335, 'pos': 0.08737333333333329}
{'compound': 0.1834313253012048, 'neg': 0.03204819277108433, 'neu': 0.8945903614457832, 'pos': 0.07337349397590363}
{'compound': -0.058446464646464656, 'neg': 0.0901919191919192, 'neu': 0.8533737373737375, 'pos': 0.056434343434343434}
{'compound': 0.049436129032258073, 'neg': 0.06221935483870969, 'neu': 0.863077419354839, 'pos': 0.07469032258064519}
{'compound': 0.10077664233576646, 'neg': 0.053270072992700715, 'neu': 0.8727883211678833, 'pos': 0.07395620437956206}
{'compound': -0.09540880503144653, 'neg': 0.09535849056603773, 'neu': 0.8386918238993711, 'pos': 0.0659622641509434}
{'compound': -0.058940259740259765, 'neg': 0.08786363636363642, 'neu': 0.844915584415584, 'pos': 0.06720995670995672}
{'compound': -0.09371438356164379, 'neg': 0.09126712328767121, 'neu': 0.8470547945205481, 'pos': 0.06167808219178085}
{'compound': -0.10401964636542241, 'neg': 0.09612770137524558, 'neu': 0.8361139489194496, 'pos': 0.06777799607072695}
{'compound': -0.046306122448979595, 'neg': 0.07844217687074834, 'neu': 0.8614761904761906, 'pos': 0.06008163265306123}
{'compound': 0.05695540540540539, 'neg': 0.06936486486486487, 'neu': 0.8577702702702703, 'pos': 0.07287837837837836}
{'compound': -0.015284375000000006, 'neg': 0.07314843749999998, 'neu': 0.8589296875000001, 'pos': 0.06794531250000001}
{'compound': 0.05184410112359551, 'neg': 0.0851095505617977, 'neu': 0.82794382022472, 'pos': 0.08693258426966298}
{'compound': 0.023425435540069702, 'neg': 0.06889895470383278, 'neu': 0.8573484320557486, 'pos': 0.07374564459930318}
【问题讨论】:
-
只需创建一个字典列表,然后将其转换为 pandas 数据框。此链接可能会有所帮助:stackoverflow.com/questions/20638006/…
标签: python pandas loops for-loop nltk