【发布时间】:2016-05-30 13:11:09
【问题描述】:
我有一组文件:
documents = c("She had toast for breakfast",
"The coffee this morning was excellent",
"For lunch let's all have pancakes",
"Later in the day, there will be more talks",
"The talks on the first day were great",
"The second day should have good presentations too")
在这组文档中,我想删除停用词。我已经删除了标点符号并转换为小写,使用:
documents = tolower(documents) #make it lower case
documents = gsub('[[:punct:]]', '', documents) #remove punctuation
首先我转换为一个 Corpus 对象:
documents <- Corpus(VectorSource(documents))
然后我尝试删除停用词:
documents = tm_map(documents, removeWords, stopwords('english')) #remove stopwords
但最后一行导致以下错误:
THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC() 进行调试。
这已经被问到here,但没有给出答案。这个错误是什么意思?
编辑
是的,我正在使用 tm 包。
这里是 sessionInfo() 的输出:
R 版本 3.0.2 (2013-09-25) 平台:x86_64-apple-darwin10.8.0(64位)
【问题讨论】:
标签: r tm topic-modeling