【问题标题】:Select i-th element if a condition occurs with for loop如果条件发生在 for 循环中,则选择第 i 个元素
【发布时间】:2019-11-30 13:32:03
【问题描述】:

我有一个这样的数据框(df):

Rif   dd    A   A   A   A   A   B   B   B   B   B   C   C   C   C   C
a1    10    5   8   10  2   6   9   6   5   7   9   1   5   6   4   5
b1    20    12  7   1   5   9   10  5   3   8   7   3   6   1   9   8
c1    100   11  6   8   1   14  1   11  9   3   6   10  8   13  8   4
d1    70    4   3   7   8   11  19  2   6   7   1   20  18  7   10  7

我有一个向量

rif <- c(0, 15, 50, 90, 110)

我想在 df 中添加一个列,如果 dd(i) >= rif(i-1) & dd(i)

Rif   dd    A   A   A   A   A   B   B   B   B   B   C   C   C   C   C  V1
a1    10    5   8   10  2   6   9   6   5   7   9   1   5   6   4   5  8 
b1    20    12  7   1   5   9   10  5   3   8   7   3   6   1   9   8  1
c1    100   1   6   8   1   14  1   11  9   3   6   10  8   13  8   4  14
d1    70    4   3   7   8   11  19  2   6   7   1   20  18  7   10  7  8

对于 Bs 和 Cs 列,V2 和 V3 也应该这样做。

ref <- c(0, 15, 50, 90, 110)

for (i in 2:length(ref)) {
  for (j in 1:nrow(df)) {
    if (df$dd >= ref[i-1] && df$dd< ref[i]) {
      df[,"V1"] <- df[j,i]
    } 
  }
}

我收到以下错误:

Error in if (..)  : 
  missing value where TRUE/FALSE needed

可能 if 命令不是正确的。 你能帮帮我吗?

【问题讨论】:

  • 在外循环的第一次迭代中,您执行ref[1-1]。零子集给出一个空向量。将其与使用&amp;&amp; 的逻辑值组合得到NA

标签: r for-loop if-statement select element


【解决方案1】:

基础 R 中的另一个选项:

lters <- c(A="A", B="B", C="C")
firstcol <- lapply(lters, function(x) match(x, colnames(DF)))
idx <- findInterval(DF$dd, rif)
for (l in lters)
    DF[, paste0("V_", l)] <- as.integer(DF[cbind(seq_len(nrow(DF)), idx + firstcol[[l]])])
DF

输出:

  Rif  dd  A A.1 A.2 A.3 A.4  B B.1 B.2 B.3 B.4  C C.1 C.2 C.3 C.4 V_A V_B V_C
1  a1  10  5   8  10   2   6  9   6   5   7   9  1   5   6   4   5   8   6   5
2  b1  20 12   7   1   5   9 10   5   3   8   7  3   6   1   9   8   1   3   1
3  c1 100 11   6   8   1  14  1  11   9   3   6 10   8  13   8   4  14   6   4
4  d1  70  4   3   7   8  11 19   2   6   7   1 20  18   7  10   7   8   7  10

数据:

DF <- structure(list(Rif = c("a1", "b1", "c1", "d1"), dd = c(10L, 20L, 
100L, 70L), A = c(5L, 12L, 11L, 4L), A = c(8L, 7L, 6L, 3L), A = c(10L, 
1L, 8L, 7L), A = c(2L, 5L, 1L, 8L), A = c(6L, 9L, 14L, 11L), 
    B = c(9L, 10L, 1L, 19L), B = c(6L, 5L, 11L, 2L), B = c(5L, 
    3L, 9L, 6L), B = c(7L, 8L, 3L, 7L), B = c(9L, 7L, 6L, 1L), 
    C = c(1L, 3L, 10L, 20L), C = c(5L, 6L, 8L, 18L), C = c(6L, 
    1L, 13L, 7L), C = c(4L, 9L, 8L, 10L), C = c(5L, 8L, 4L, 7L
    )), class = "data.frame", row.names = c(NA, -4L))
rif <- c(0, 15, 50, 90, 110)

另一种方法是通过将查找值分离到另一个表中来重组数据,并使用data.table 执行更新连接:

library(data.table)
setDT(DF)
out <- DF[, .(rn=.I, Rif, dd)]

#reorganizing data
lc <- grepl("A|B|C", names(DF))
lutbl <- data.table(COL=names(DF)[lc], transpose(DF[, ..lc]))
lutbl <- melt(lutbl, measure.vars=patterns("V"), variable.name="rn")[,
    c("rn", "rif") := .(as.integer(gsub("V", "", rn)), rep(rif, sum(lc)*nrow(DF)/length(rif)))]

#lookup and update
for (l in lters)
    out[, paste0("NEW", l) := lutbl[COL==l][out, on=c("rn", "rif"="dd"), roll=-Inf, value]]

出来:

   rn Rif  dd NEWA NEWB NEWC
1:  1  a1  10    8    6    5
2:  2  b1  20    1    3    1
3:  3  c1 100   14    6    4
4:  4  d1  70    8    7   10

【讨论】:

    【解决方案2】:

    我认为您只需要更好地指定行和列:

    df <- data.frame(
          c("a1","b1","c1","d1")
          , c(10,20,100,70), c(5,12,11,4), c(8,7,6,3), c(10,1,8,7), c(2,5,1,8), c(6,9,14,11)
          , c(9,10,1,19), c(6,5,11,2), c(5,3,9,6), c(7,8,3,7), c(9,7,6,1)
          , c(1,3,10,20), c(5,6,8,18), c(6,1,13,7), c(4,9,8,10), c(5,8,4,7)
    )
    
    colnames(df) <- c("Rif", "dd", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C")
    
    ref <- c(0, 15, 50, 90, 110)
    
    for (i in 2:length(ref)) {
      for (j in 1:nrow(df)) {      
        if (df$dd[j] >= ref[i-1] && df$dd[j] < ref[i]) {
          df$V1[j] <- df[j,i+2]
          df$V2[j] <- df[j,i+2+5]
          df$V3[j] <- df[j,i+2+10]
        } 
      }
    }
    

    给出:

      Rif  dd  A A  A A  A  B  B B B B  C  C  C  C C V1 V2 V3
    1  a1  10  5 8 10 2  6  9  6 5 7 9  1  5  6  4 5  8  6  5
    2  b1  20 12 7  1 5  9 10  5 3 8 7  3  6  1  9 8  1  3  1
    3  c1 100 11 6  8 1 14  1 11 9 3 6 10  8 13  8 4 14  6  4
    4  d1  70  4 3  7 8 11 19  2 6 7 1 20 18  7 10 7  8  7 10
    

    【讨论】:

    • 感谢@cgrafe。我已经尝试过您的建议,但出现以下错误:$&lt;-.data.frame(*tmp*, "V1", value = c(NA, NA, NA, 中的错误:替换有 4 行,数据有 14343. - 14343是我的 df
    • 以下是有关该错误的一些信息:stackoverflow.com/questions/29814912/…。尝试初始化变量(即“df$V1
    猜你喜欢
    • 2014-08-28
    • 2017-10-03
    • 2018-08-13
    • 2018-08-13
    • 2014-05-01
    • 1970-01-01
    • 1970-01-01
    • 2014-06-06
    • 2016-03-18
    相关资源
    最近更新 更多