【问题标题】:Use apply family between each element in a list and another set in R在列表中的每个元素和 R 中的另一个集合之间使用应用族
【发布时间】:2017-06-21 21:59:37
【问题描述】:

我将一个文本文档拆分为 n 个块并将每个块存储在一个列表中。每个块被转换为一组单词,然后在其中一个块和另一个较短的文本之间应用余弦相似度函数,该较短的文本在发送到函数之前也被转换为一组。我需要以某种方式将每个块传递给函数以与第二组进行比较,但想知道 apply 系列的函数之一是否可以完成这项工作,而不是使用循环。将每个结果存储在向量中也会节省一些时间。

这是我正在使用的(部分代码来自this

library("data.table","qdap","sets", "lsa")

s <- c("employees businesses san gwann admitted sales taken hit after traffic diversions implemented without notice vjal ir - rihan over weekend.", 
"also complained werent consulted diversion blocked vehicles driving centre      san gwann via roundabout forks san gwann industrial estate, church forced   motorists take detour around block instead.", 
"barriers erected roundabout exit, after youtube video cars disregarding signage passing through roundabout regardless went viral.", 
"planned temporary diversion, brace san gwann influx cars set pass through during works kappara junction project.", 
"usually really busy weekend, our sales lower round, corner store worker maria abela admitted maltatoday.")

c <- "tm dont break whats broken. only queues developing, pass here every morning never experienced such mess notwithstanding tm officials directing traffic. hope report congestion happening area. lc tm tried pro - active hope admit recent traffic changes working."


calculateCosine <- function(setX, setY){
require(qdap)
y <- c(unlist(as.character(tolower(setY))))
x <- c(unlist(strsplit(as.character(tolower(setX)), split = ", ")))
diffLength <- length(y) - length(x)
x <- bag_o_words(x)
for(pad in 1 : diffLength){
  x <- c(x, "")
  }
  # write both sets to temp files and calculate cosine similarity
  write(y, file=paste(td, "Dy", sep="/"))
  write(x, file=paste(td, "Dx", sep="/"))
  myMatrix = textmatrix(td, stopwords=stopwords_en, minWordLength = 3)
  similCosine <- as.numeric(round(cosine(myMatrix[,1], myMatrix[,2]), 3))
  return(similCosine)
}

n <- 3
max <- length(s)%/%n
x <- seq_along(s)
d1 <- split(s, ceiling(x/max))
res <- c()
for(i in 1 : length(d1)){
  val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
  res <- c(res, val)
}

为了简洁起见,是否可以将循环更改为应用函数之一?任何想法或 cmets 将不胜感激。谢谢。

【问题讨论】:

  • 已编辑。感谢您指出这一点。

标签: r loops apply


【解决方案1】:

考虑使用repsapply 调整两个for 循环:

内部计算余弦

# ORIGINAL CODE
x <- bag_o_words(x)
for(pad in 1 : diffLength){
  x <- c(x, "")
  }

# ADJUSTED CODE
x <- bag_o_words(x)
x <- c(x, rep("", diffLength))     

# OR ONE LINE
x <- c(bag_o_words(x), rep("", diffLength))

计算余弦之外 (如果您需要返回列表而不是向量/矩阵,请更改为lapply

# ORIGINAL CODE
res <- c()
for(i in 1 : length(d1)){
  val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
  res <- c(res, val)
}

# ADJUSTED CODE
res <- sapply(d1, function(i) {
  calculateCosine(as.set(paste(i, sep = " ", collapse = " ")), as.set(c))
})

【讨论】:

  • 非常感谢。工作完美,直到diffLength 变为负数,更短的setY。我将calculateCosine 函数修改为填充setY 而不是setX。再次感谢。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-02-13
  • 2020-02-27
相关资源
最近更新 更多