【发布时间】:2021-05-29 03:14:14
【问题描述】:
我使用大型数据库作为输入。我尝试了两种不同的方法,但都得到了相同的结果,每个循环都打印了第一行。
我不确定我在这里做错了什么。任何帮助将不胜感激。
我的代码
def cal_score(search_word):
for file in files:
with open(catcal_dir + file, "r") as infile:
content = json.load(infile)
if word in content["Convo"]:
convo_content = content["Convo"]
vectorizer = TfidfVectorizer(stop_words = {'english'}, ngram_range=(1,3), lowercase=True)
tfidf_print = vectorizer.fit_transform([convo_content])
feature_names = vectorizer.get_feature_names()
feature_index = tfidf_print[0,:].nonzero()[1]
tfidf_scores = zip(feature_index, [tfidf_print[0, x] for x in feature_index])
data = {}
for word, score in [(feature_names[i], score) for (i, score) in tfidf_scores]:
if search_word == word:
data['Score'] = score
data['Date'] = content['Date']
data['Term'] = word
df = pd.DataFrame(data, columns = ['Date', 'Score', 'Term'], index=[0])
print(df)
print(cal_score('nekko'))
我得到的输出
Date Score Term
0 May 16, 1797 0.002463 nekko
Date Score Term
0 March 04, 1809 0.005918 nekko
Date Score Term
0 July 09, 1812 0.019306 nekko
Date Score Term
0 March 04, 1813 0.006175 nekko
Date Score Term
0 July 23, 1813 0.008521 nekko
我想要的输出
Date Score Term
0 May 16, 1797 0.002463 nekko
1 March 04, 1809 0.005918 nekko
2 July 09, 1812 0.019306 nekko
3 March 04, 1813 0.006175 nekko
4 July 23, 1813 0.008521 nekko
谢谢。
【问题讨论】:
标签: python pandas dataframe tfidfvectorizer