【发布时间】:2018-02-07 04:31:53
【问题描述】:
我是 Python 新手。我正在尝试创建一个程序来读取文本文件并在该文本中搜索某些分组的单词(我通过从 csv 读取预定义)。例如,如果我想为包含“兴奋”、“快乐”和“乐观”等词的“积极”创建自己的定义,则 csv 将包含这些词。我知道下面的内容很混乱 - 我正在读取的 txt 文件包含我从 csv 读取的三个“正面”测试词出现 7 次,但结果打印为 25。我认为它返回的是字符数,而不是字数.代码:
import csv
import string
import re
from collections import Counter
remove = dict.fromkeys(map(ord, '\n' + string.punctuation))
# Read the .txt file to analyze.
with open("test.txt", "r") as f:
textanalysis = f.read()
textresult = textanalysis.lower().translate(remove).split()
# Read the CSV list of terms.
with open("positivetest.csv", "r") as senti_file:
reader = csv.reader(senti_file)
positivelist = list(reader)
# Convert term list into flat chain.
from itertools import chain
newposlist = list(chain.from_iterable(positivelist))
# Convert chain list into string.
posstring = ' '.join(str(e) for e in newposlist)
posstring2 = posstring.split(' ')
posstring3 = ', '.join('"{}"'.format(word) for word in posstring2)
# Count number of words as defined in list category
def positive(str):
counts = dict()
for word in posstring3:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
total = sum (counts.values())
return total
# Print result; will write to CSV eventually
print ("Positive: ", positive(textresult))
【问题讨论】:
-
一些示例文本可能会有所帮助...
标签: python csv text sentiment-analysis