【问题标题】:To output dtm from a reactive function and use in another reactive function从反应函数输出 dtm 并在另一个反应函数中使用
【发布时间】:2018-08-20 21:24:25
【问题描述】:

我有一个反应函数,其中创建了一个文档术语矩阵。我需要在另一个反应函数中使用 dtm。

我已经使用了从一个反应到另一个函数的数据框 测试()$数据框。以类似的方式使用 dtm 是否合适?

  make_tree <- reactive ({
    validate(
      need((input$text != "") || (!is.null(input$file)),
           "Please give me some text to work upon!"
      )
    )
    # If text input is not empty then get the corpus
    # else load text from the text file uploaded.
    # case of both text box and file uploader being empty was
    # covered by the above validate function.
    if (nchar(input$text) > 0){
      docs <- Corpus(VectorSource(input$text))
    }
    else if (!is.null(input$file)){
      filenames <- input$file$datapath
    }

    filenames<-input$file$datapath
    #read files into a character vector
    files <- lapply(filenames,readLines)

    #create corpus from vector
    docs <- Corpus(VectorSource(files))

    #inspect a particular document in corpus - data import check
    #writeLines(as.character(docs[[1]]))


    #start preprocessing
    #Transform to lower case
    docs <-tm_map(docs,content_transformer(tolower))


    #remove potentially problematic symbols
    toSpace <- content_transformer(function(x, pattern) { return (gsub(pattern, " ", x))})
    docs <- tm_map(docs, toSpace, "-")
    docs <- tm_map(docs, toSpace, "'")
    docs <- tm_map(docs, toSpace, "`")
    docs <- tm_map(docs, toSpace, ":")

    # replace other symbols and junk as necessary
    #remove punctuation
    docs <- tm_map(docs, removePunctuation)
    #Strip digits
    docs <- tm_map(docs, removeNumbers)
    #remove stopwords
    docs <- tm_map(docs, removeWords, stopwords("english"))
    #remove whitespace
    docs <- tm_map(docs, stripWhitespace)

    #Check if replacements have been done
    #writeLines(as.character(docs[[1]]))

    #Stem document
    docs <- tm_map(docs,stemDocument)

    myStopwords1 <- c("can", "say","one","way","use",
                      "also","however","tell","will",
                      "much","need","take","tend","even",
                      "like","particular","rather","said",
                      "get","well","make","ask","come","end",
                      "first","two","help","often","may",
                      "might","see","something","thing","point",
                      "post","look","right","now","think","'ve ",
                      "'re ","another","put","set","new","good",
                      "want","sure","kind","yes,","day","etc",
                      "quit","since","attempt","lack","seen","aware",
                      "little","ever","moreover","though","found","able",
                      "enough","far","earlier","away","achieve","draw",
                      "last","never","brief","bit","entire","brief",
                      "great","lot")
    docs <- tm_map(docs, removeWords, myStopwords1)


    #Create document-term matrix
    dtm <- DocumentTermMatrix(docs)

    #convert rownames to filenames
    rownames(dtm) <- input$file$datapath

    #collapse matrix by summing over columns
    freq <- colSums(as.matrix(dtm))

    #length should be total number of terms
    # length(freq)

    #create sort order (descending)
    ord <- order(freq,decreasing=TRUE)

    #List all terms in decreasing order of freq and write to disk

    sortedWordFreq<- as.data.frame(freq[ord])
    sortedWordFreq$words <- row.names(sortedWordFreq)
    colnames(sortedWordFreq) <- c("freq","words")
    row.names(sortedWordFreq)<- NULL
    sortedWordFreq<-sortedWordFreq[,c(2,1)]

    #Set parameters for Gibbs sampling
    burnin <- 4000
    iter <- 2000
    thin <- 500
    seed <-list(2003,5,63,100001,765)
    nstart <- 5
    best <- TRUE

    #Number of topics
    numberOfTopics <- 11

    #Run LDA using Gibbs sampling
    ldaOut <-LDA(dtm,numberOfTopics, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))

    #write out results
    #docs to topics - use only if there are multiple text files and you want to assign a topic to each doc
    ldaOut.topics <- as.matrix(topics(ldaOut))

    #top x terms in each topic
    topNtemsInTopic <- 5
    ldaOut.terms <- as.matrix(terms(ldaOut,topNtemsInTopic))

    ##write.csv(freq[ord],"word_freq.csv")
    wordfreq <-sortedWordFreq

    topicwords<-data.frame(ldaOut.terms)

    test <- data.frame(words=unlist(topicwords, use.names = T))

    test<-setDT(test, keep.rownames = TRUE)[]

    topicwords<-merge(x = test, y = wordfreq, by = "words", all.x = TRUE)

    topicwords$topic<-substr(topicwords$rn,0,(stri_length(topicwords$rn)-1))

    tree<-subset(topicwords[,c(4,1,3)])

    png("treemap.png", width = 3, height = 3, units = "in", res = 500)
    w <- treemap(tree, #Your data frame object
                 index=c("topic","words"),  #A list of your categorical variables
                 vSize = "freq",  #This is your quantitative variable
                 type="index", #Type sets the organization and color scheme of your treemap
                 palette = "RdYlGn",  #Select your color palette from the RColorBrewer presets or make your own.
                 title="Topics", #Customize your title
                 fontsize.title = 5 #Change the font size of the title

    )
    dev.off()
    list(dtm=dtm)
    filename <- "treemap.png"

  })

我尝试在 make_tree 反应函数中使用 list 输出 dtm,并在下面的反应函数中使用。

    topic_wordcloud<-observe({
  req(input$file)
  if(input$number != "NONE"){

  numberOfTopics <- input$number

  #Run LDA using Gibbs sampling
  dtm<-make_tree$dtm
  ldaOut <-LDA(dtm,numberOfTopics, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))

  #write out results
  #docs to topics - use only if there are multiple text files and you want to assign a topic to each doc
  ldaOut.topics <- as.matrix(topics(ldaOut))

  for (i in 1:input$number){
    topic <- i
    df <- data.frame(term = ldaOut@terms, p = exp(ldaOut@beta[i,]))
    head(df[order(-df$p),])

    mypath<-file.path("D:","Dropbox (eClerx Services Ltd.)","kaveri.malviya","My Documents","TM_Demo (2)","TM_Demo" ,paste("topic_",i, ".jpg", sep = ""))

    jpeg(file=mypath)

    wordcloud(words = df$term,
              freq = df$p,
              max.words = 100,
              random.order = FALSE,
              rot.per = 0.35,
              colors=brewer.pal(8, "Dark2"))
    dev.off()
  }
}

})

input$number是用户想要的topic数量。下面是Ui.R的部分

    tabPanel(title = "Topic WordClouds",
                                             box(width = 12,
                                                 height = 600,
                                                 background = NULL,
                                                 solidHeader = TRUE,
                                                 status = "primary",
                                                 title = tags$strong("Topic WordClouds"),
                                                 selectInput("number","Select Number of Topics",choices=c("NONE",seq(1,20,by=1)),multiple=FALSE,selected="NONE")
#                                                  imageOutput("treemap")
                                                ) 
                                             )

如果我按照上述步骤操作,则会出现以下错误 警告:错误:$ 运算符对原子向量无效

【问题讨论】:

  • 我不明白为什么不这样做。寻求帮助时,您应该包含一个简单的reproducible example,其中包含可用于测试和验证可能解决方案的示例输入和所需输出。

标签: r shiny


【解决方案1】:

如果您使用make_tree &lt;- reactive({}),那么您需要从该反应式表达式块中返回一个对象。默认情况下,R 返回块中的最后一个值。在你的情况下,最后一行是

filename <- "treemap.png"

所以他们继续;你的函数返回的东西是"treemap.png"。你需要切换这条线所以

list(dtm=dtm)

是块中的最后一行。

【讨论】:

  • 感谢 Flick。我之前已经尝试过您的解决方案,但还有其他问题现在已经解决。感谢您的帮助。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-09-12
  • 1970-01-01
  • 2017-06-21
  • 2014-09-06
相关资源
最近更新 更多