【问题标题】:function works perfectly fine outside ddply but throws an error inside ddply函数在 ddply 外部运行良好,但在 ddply 内部抛出错误
【发布时间】:2016-02-29 13:10:14
【问题描述】:

我正在尝试对数据框进行逐行操作(按组)。当我只为一组运行该功能时,它运行得非常好。但是,当我将函数放入 ddply 为所有组运行它时,它会引发错误 - 参数的长度为零。

在数据框'test'上单独运行时的函数:

for (i in 1:(nrow(test) - 5)) {

  if (i <= 5) {
    test[i, "MPPALERT"] <- 0    
  }

  FIRSTMPP <- test[i, "TAGMPPSEARCHCOUNT"]
  LASTMPP <- test[i+5, "TAGMPPSEARCHCOUNT"]

  if ((LASTMPP - FIRSTMPP) >= 10) {
    test[i+5, "MPPALERT"] <- 1    
  } else {
    test[i+5, "MPPALERT"] <- 0    
  }

}

上面ddply里面的函数抛出错误:

Error in if (LASTMPP - FIRSTMPP >= 10) { : argument is of length zero

下面是ddply代码:

mpp_fn <- function(x) {  

  for (i in 1:(nrow(x) - 5)) {

    if (i <= 5) {
      x[i, "MPPALERT"] <- 0    
    }

    FIRSTMPP <- x[i, "TAGMPPSEARCHCOUNT"]
    LASTMPP <- x[i+5, "TAGMPPSEARCHCOUNT"]

    if (LASTMPP - FIRSTMPP >= 10) {
      x[i+5, "MPPALERT"] <- 1    
    } else {
      x[i+5, "MPPALERT"] <- 0    
    }

  }

}

result <- ddply(data, c("SHELTERID", "INVERTERID"), mpp_fn(x))

在上面的代码中,FIRSTMPP 和 LASTMPP 的值解析为 NULL,因此会出现错误,但是为什么会发生这种情况(当它在 ddply 之外运行得很好时)?

更新:这是 dput(data) 的输出:

structure(list(SHELTERID = c("SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03"), INVERTERID = c("I1", "I1", "I1", 
"I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", 
"I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2", "I2", 
"I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", 
"I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", 
"I1", "I1", "I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", 
"I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", 
"I2", "I2", "I2"), TAGMPPSEARCHCOUNT = c(0, 0, 0, 0, 0, 0, 0, 
2, 0, 0, 3, 0, 0, 3, 0, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 4, 0, 0, 4, 0, 4, 0, 0, 5, 0, 0, 
5, 0)), .Names = c("SHELTERID", "INVERTERID", "TAGMPPSEARCHCOUNT"
), row.names = c(350L, 351L, 352L, 353L, 354L, 355L, 356L, 357L, 
358L, 359L, 360L, 361L, 362L, 363L, 364L, 365L, 366L, 367L, 494L, 
495L, 496L, 497L, 498L, 499L, 500L, 501L, 502L, 503L, 504L, 505L, 
506L, 507L, 508L, 509L, 510L, 511L, 638L, 639L, 640L, 641L, 642L, 
643L, 644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L, 652L, 653L, 
654L, 655L, 782L, 783L, 784L, 785L, 786L, 787L, 788L, 789L, 790L, 
791L, 792L, 793L, 794L, 795L, 796L, 797L, 798L, 799L), class = "data.frame")

【问题讨论】:

  • 可以通过发布dput(data) 的输出来提供可重现的示例。请不要使用完整的data,而只能使用它的最小子集
  • 当然@Thierry。以下是一小部分数据。
  • 请添加dput(data) 的输出,而不仅仅是datadput(data) 的输出很容易将您的数据复制粘贴到我们的 R 会话中。
  • 谢谢@Thierry。我刚刚添加了 dput(data) 的输出。

标签: r plyr


【解决方案1】:

这是一个dplyr 解决方案。它不需要显式循环

library(dplyr)
data %>% 
  group_by(SHELTERID, INVERTERID) %>% 
  mutate(
    First = lag(TAGMPPSEARCHCOUNT, 5), 
    MPPALERT = ifelse(
      is.na(First),
      0,
      ifelse(
        TAGMPPSEARCHCOUNT - First > 10, 
        1, 
        0
      )
    )
  )

【讨论】:

    猜你喜欢
    • 2012-11-29
    • 1970-01-01
    • 2022-06-16
    • 2011-10-20
    • 1970-01-01
    • 2018-02-19
    • 2022-11-25
    • 1970-01-01
    • 2017-11-07
    相关资源
    最近更新 更多