【问题标题】:What does support feature mean in result of function "term_stats()" from package "tm" in R and how is it different from count?支持功能在 R 中的包“tm”中的函数“term_stats()”的结果中意味着什么,它与计数有何不同?
【发布时间】:2025-11-28 22:00:02
【问题描述】:

运行以下脚本将产生结果

a <- c("Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking. Don't settle. As with all matters of the heart, you'll know when you find it. - Steve Jobs")
a_source <- VectorSource(a)
a_corpus <- VCorpus(a_source)
term_stats(a_corpus)
term_stats(a_corpus)

       term    count   support
    1  .         5       1
    2  to        5       1
    3  is        4       1
    4  you       4       1
    5  ,         3       1

【问题讨论】:

标签: r nlp tm


【解决方案1】:

support 是单词出现的文档数,count 是出现的次数。如果做 tf-idf,你需要两者。

library(tm)

txt <- c("Your work is going to fill a large part of your life, 
       and the only way to be truly satisfied is to do what you
        believe is great work. 
       And the only way to do great work is to love what you do. 
       If you haven't found it yet, keep looking. Don't settle. 
       As with all matters of the heart, you'll know when you find it. 
       - Steve Jobs")

term_stats(VCorpus(VectorSource(txt)))[1:5,]

term count support
.        5       1
to       5       1
is       4       1


#Split txt into 4 docs
txt_df <- data.frame( txt = c(
"Your work is going to fill a large part of your life, 
 and the only way to be truly satisfied is to do what you 
 believe is great work." , 
 "And the only way to do great work is to love what you do." , 
 "If you haven't found it yet, keep looking. Don't settle." , 
 "As with all matters of the heart, you'll know when you find it. - 
 Steve Jobs"))

term_stats(VCorpus(VectorSource(txt_df$txt)))[1:6,]

term count support
.        5       4
you      4       4
,        3       3
the      3       3
to       5       2
is       4       2

默认是按支持排序。

【讨论】:

  • term_stats 来自corpus,而不是tm;您可以将此示例简化为library(corpus); text &lt;- "..."; term_stats(text)
最近更新 更多