【发布时间】:2017-06-21 21:59:37
【问题描述】:
我将一个文本文档拆分为 n 个块并将每个块存储在一个列表中。每个块被转换为一组单词,然后在其中一个块和另一个较短的文本之间应用余弦相似度函数,该较短的文本在发送到函数之前也被转换为一组。我需要以某种方式将每个块传递给函数以与第二组进行比较,但想知道 apply 系列的函数之一是否可以完成这项工作,而不是使用循环。将每个结果存储在向量中也会节省一些时间。
这是我正在使用的(部分代码来自this:
library("data.table","qdap","sets", "lsa")
s <- c("employees businesses san gwann admitted sales taken hit after traffic diversions implemented without notice vjal ir - rihan over weekend.",
"also complained werent consulted diversion blocked vehicles driving centre san gwann via roundabout forks san gwann industrial estate, church forced motorists take detour around block instead.",
"barriers erected roundabout exit, after youtube video cars disregarding signage passing through roundabout regardless went viral.",
"planned temporary diversion, brace san gwann influx cars set pass through during works kappara junction project.",
"usually really busy weekend, our sales lower round, corner store worker maria abela admitted maltatoday.")
c <- "tm dont break whats broken. only queues developing, pass here every morning never experienced such mess notwithstanding tm officials directing traffic. hope report congestion happening area. lc tm tried pro - active hope admit recent traffic changes working."
calculateCosine <- function(setX, setY){
require(qdap)
y <- c(unlist(as.character(tolower(setY))))
x <- c(unlist(strsplit(as.character(tolower(setX)), split = ", ")))
diffLength <- length(y) - length(x)
x <- bag_o_words(x)
for(pad in 1 : diffLength){
x <- c(x, "")
}
# write both sets to temp files and calculate cosine similarity
write(y, file=paste(td, "Dy", sep="/"))
write(x, file=paste(td, "Dx", sep="/"))
myMatrix = textmatrix(td, stopwords=stopwords_en, minWordLength = 3)
similCosine <- as.numeric(round(cosine(myMatrix[,1], myMatrix[,2]), 3))
return(similCosine)
}
n <- 3
max <- length(s)%/%n
x <- seq_along(s)
d1 <- split(s, ceiling(x/max))
res <- c()
for(i in 1 : length(d1)){
val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
res <- c(res, val)
}
为了简洁起见,是否可以将循环更改为应用函数之一?任何想法或 cmets 将不胜感激。谢谢。
【问题讨论】:
-
已编辑。感谢您指出这一点。