【问题标题】:How to select different columns for each row in a data frame in R based on vectorsHow to select different columns for each row in a data frame in R based on vectors
【发布时间】:2022-12-02 01:41:38
【问题描述】:

I have a data frame with 4 columns and for each row, I want to extract 2 of the 4 columns (but for each row, it's going to be different columns).

repro = structure(list(c1 = c(0L, 0L, 1L, 1L, 0L, 1L), c2 = c(1L, 1L, 
0L, 0L, 1L, 1L), c1 = c(0L, 1L, 1L, 0L, 1L, 0L), c2 = c(0L, 1L, 
1L, 1L, 1L, 0L)), row.names = c(86L, 59L, 58L, 79L, 70L, 83L), 
class = "data.frame")

head(repro)
   c1 c2 c1 c2
86  0  1  0  0
59  0  1  1  1
58  1  0  1  1
79  1  0  0  1
70  0  1  1  1
83  1  1  0  0

Vectors of columns to select in the repro data frame

col.sel1 = c(2, 1, 2, 2, 2, 2)
col.sel2 = c(4, 3, 3, 4, 3, 3)

For loop to select the columns (it works, but for my original data, it takes for ever as there are thousands of lines...).

# Make offspring table 
offspring = NULL
for (i in 1:nrow(repro)) {
  offs = cbind(c3 = repro[i,col.sel1[i]], 
               c4 = repro[i,col.sel2[i]])
  offspring = rbind(offspring,offs)
}
head(offspring)

Giving

     c3 c4
[1,]  1  0
[2,]  0  1
[3,]  0  1
[4,]  0  1
[5,]  1  1
[6,]  1  0

Is there a faster way to select different columns for each rows based on the 2 vectors col.sel1 and col.sel2?

I've tried:

rp[1:6, cs1]
lapply(cs1, function(x) rp[,x])

But both don't give this expected result.

【问题讨论】:

    标签: r select


    【解决方案1】:

    You can [-index frames/matrices with a matrix:

    cbind(
      c3 = repro[cbind(seq_along(col.sel1), col.sel1)], 
      c4 = repro[cbind(seq_along(col.sel2), col.sel2)]
    )
    #      c3 c4
    # [1,]  1  0
    # [2,]  0  1
    # [3,]  0  1
    # [4,]  0  1
    # [5,]  1  1
    # [6,]  1  0
    

    Diving in, we see

    cbind(seq_along(col.sel1), col.sel1)
    #        col.sel1
    # [1,] 1        2
    # [2,] 2        1
    # [3,] 3        2
    # [4,] 4        2
    # [5,] 5        2
    # [6,] 6        2
    

    Which means that the firstvaluewe want is row 1 column 2; then row 2, column 1; etc. The resulting values (for the first set) are:

    repro[cbind(seq_along(col.sel1), col.sel1)]
    # [1] 1 0 0 0 1 1
    

    We can then combine those with cbind (into a matrix ... easily converted to a frame by replacing cbind with data.frame).

    If you have an arbitrary set of these vectors, you can automate this to be "0 or more" with:

    L <- list(c3=col.sel1, c4=col.sel2)
    data.frame(lapply(L, function(z) repro[cbind(seq_along(z), z)]))
    #   c3 c4
    # 1  1  0
    # 2  0  1
    # 3  0  1
    # 4  0  1
    # 5  1  1
    # 6  1  0
    

    Side note: you used 1:nrow(repro), but it is safer to use seq_along(col.sel1) instead: this allows for selection of values in a length different than the number of rows. I recognize that inthisuse case you are likely intending exactly and always one per row, but .. it's still a safer alternative. (Since repro[cbind(1:3, 1:4)] will not work correctly due to the unequal lengths of the vectors.)

    【讨论】:

      猜你喜欢
      • 2022-12-02
      • 2022-12-02
      • 2022-12-02
      • 2022-12-26
      • 1970-01-01
      • 2022-12-28
      • 1970-01-01
      • 2022-12-27
      • 2022-12-19
      相关资源
      最近更新 更多