【问题标题】:What's making the texts lowercase in this Corpora, and how can I turn it uppercase?是什么使这个语料库中的文本小写,我怎样才能把它变成大写?
【发布时间】:2019-07-28 13:47:51
【问题描述】:

我正在尝试在R 中构建一个词云,但它只返回小写文本。

sheet <- read_excel('list_products.xls', skip = 4)
products <- c(sheet$Cod)
products <- Corpus(VectorSource(products))
c_words <- brewer.pal(8, 'Set2')
wordcloud(products, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

一旦我尝试将以下代码放在 wordcloud 函数之前,它就不起作用了:

products <- tm_map(products, content_transformer(toupper))

是什么让文本变成小写,我应该怎么做才能把它变成大写?

【问题讨论】:

    标签: r uppercase corpus word-cloud


    【解决方案1】:

    好吧,正如您从这里看到的:Make all words uppercase in Wordcloud in R,当您执行TermDocumentMatrix(CORPUS) 时,默认情况下单词变为小写。 确实,如果在没有参数freq 的情况下执行trace(wordcloud),则会执行tdm &lt;- tm::TermDocumentMatrix(corpus),因此您的单词会变为小写。

    您有两种选择来解决这个问题: 包括单词和频率而不是语料库:

    filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt" # I am using this text because you DID NOT PROVIDED A REPRODUCIBLE EXAMPLE
    text <- readLines(filePath)
    products <- Corpus(VectorSource(text))
    products <- tm_map(products, toupper)
    c_words <- brewer.pal(8, 'Set2')
    tdm <- tm::TermDocumentMatrix(products, control = list(tolower = F))
    freq_corpus <- slam::row_sums(tdm)
    wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
    

    你会得到:

    第二个选项是修改wordcloud:

    首先你做trace(worcloud, edit=T),然后将第21行替换为:

    tdm &lt;- tm::TermDocumentMatrix(corpus, control = list(tolower = F))

    点击保存并执行:

    filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
    text <- readLines(filePath)
    products <- Corpus(VectorSource(text))
    products <- tm_map(products, toupper)
    c_words <- brewer.pal(8, 'Set2')
    wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
    

    你会得到类似的东西:

    【讨论】:

    • 感谢您做出如此明确的指示做出回应。很抱歉没有提供足够的内容来重现我的代码。
    猜你喜欢
    • 2014-01-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-05-27
    • 1970-01-01
    • 1970-01-01
    • 2011-06-03
    相关资源
    最近更新 更多