【发布时间】:2021-08-04 12:17:34
【问题描述】:
我想更好地了解 TfidfVectorizer 的工作原理。我不明白如何使用get_feature_name等后续功能
这是我的问题的可重现示例:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ['It was a queer, sultry summer', 'the summer they electrocuted the Rosenbergs',
'and I didn’t know what I was doing in New York', 'I m stupid about executions',
'The idea of being electrocuted makes me sick',
'and that’s all there was to read about in the papers',
'goggle eyed headlines staring up at me on every street corner and at the fusty',
'peanut-selling mouth of every subway', 'It had nothing to do with me',
'but I couldn’t help wondering what it would be like',
'being burned alive all along your nerves']
tfidf_vect = TfidfVectorizer(max_df=0.7,
min_df= 0.01,
use_idf=True,
ngram_range=(1,2))
tfidf_mat = tfidf_vect.fit_transform(text)
print(tfidf_mat)
features = tfidf_vect.get_feature_names()
print(features)
在这个例子中,我认为我的对象tfidf_vect 定义了我想要应用TfidfVectorizer 的所有参数,然后我将其应用到text,以获得对象tfidf_mat 中的结果。
我不明白为什么,为了提取我的 tfidf 分析的附加信息,我将函数应用于对象 tfidf_vect 而不是 tfidf_mat。
命令tfidf_vect.get_feature_names() 如何知道这将应用于text,如果它的定义中没有指定?
【问题讨论】:
标签: python scikit-learn tf-idf tfidfvectorizer