【问题标题】:Extract columns from data frames in a list in a separate list of data frames从单独的数据框列表中的列表中的数据框中提取列
【发布时间】:2021-06-06 11:17:19
【问题描述】:

我有一个包含多个数据框的列表 -cj1-

dput(head(cj1[1:2]))
list(structure(list(individual = c("a12TTT.pdf", "a15.pdf", "a17.pdf", 
"a18.pdf", "a21.pdf", "a2TTT.pdf", "a5.pdf", "B11.pdf", "B12.pdf", 
"B13.pdf", "B22.pdf", "B24.pdf", "B4.pdf", "B7.pdf", "B8.pdf", 
"cw10-1.pdf", "cw13-1.pdf", "cw15-1TTT.pdf", "cw17-1.pdf", "cw18.pdf", 
"cw3.pdf", "cw4.pdf", "cw7_1TTT.pdf"), id = 1:23, Ntot = c(13, 
9, 16, 15, 9, 13, 10, 10, 11, 10, 14, 10, 11, 12, 11, 10, 15, 
12, 14, 11, 9, 10, 11), N1 = c(5, 5, 10, 11, 7, 9, 5, 5, 6, 8, 
8, 8, 9, 8, 7, 1, 0, 6, 3, 4, 2, 4, 2), ND = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), N0 = c(8, 
4, 6, 4, 2, 4, 5, 5, 5, 2, 6, 2, 2, 4, 4, 9, 15, 6, 11, 7, 7, 
6, 9), score = c(5.06923076923077, 4.96666666666667, 9.925, 10.86, 
6.83333333333333, 8.88461538461539, 5, 5, 5.97272727272727, 7.82, 
7.95714285714286, 7.82, 8.80909090909091, 7.9, 6.91818181818182, 
1.24, 0.3, 6, 3.17142857142857, 4.08181818181818, 2.16666666666667, 
4.06, 2.19090909090909), propscore = c(0.389940828402367, 0.551851851851852, 
0.6203125, 0.724, 0.759259259259259, 0.683431952662722, 0.5, 
0.5, 0.54297520661157, 0.782, 0.568367346938776, 0.782, 0.800826446280992, 
0.658333333333333, 0.628925619834711, 0.124, 0.02, 0.5, 0.226530612244898, 
0.371074380165289, 0.240740740740741, 0.406, 0.199173553719008
), theta = c(-0.571211122446447, 0.418736780198501, 0.464533662219296, 
0.760432013134893, 1.43961032059382, 0.935963883364303, 0.0742361005467161, 
0.416783201347136, 0.232586422933618, 1.65345248955369, 0.178947462869717, 
1.3980442736112, 1.5300599487058, 0.340087410746963, 0.616985944469495, 
-1.73246102772711, -4.06186172096556, -0.347700710331151, -1.21009964741398, 
0.239145600406579, -1.88836418690337, -0.276451472526056, -0.611455626388059
), se.theta = c(0.689550115014498, 0.689441554709003, 0.595659709892116, 
0.609506508256404, 0.917792293663691, 0.652011367164736, 0.720534163064516, 
0.695969555549033, 0.661019531367007, 0.87050969318314, 0.605775647419845, 
0.797443937820774, 0.768436114096332, 0.695748274310803, 0.709380679025605, 
1.00089414765463, 1.8701468050665, 0.68959824350285, 0.733014089189809, 
0.656392513303483, 0.952935324276941, 0.71608982789968, 0.771906532861938
), outfit = c(1.24922700170817, 1.46067763769417, 0.915183304626819, 
0.753992664091072, 0.37410361433915, 0.727316037037668, 0.616907868814702, 
1.01528298230254, 1.01594232662062, 0.616808170683195, 0.646097057961938, 
0.622993494551005, 0.807441271101246, 0.788526018181888, 1.2157399735092, 
0.341189086206191, 0.021052091633073, 0.543024513106335, 1.04183076617928, 
1.1772656963046, 0.736106160865241, 0.756316095787985, 0.58320701094964
), infit = c(1.4078580948461, 1.42854494963967, 1.09762978932861, 
0.893957122448352, 0.64936943769433, 0.899191443180872, 0.724956556509282, 
1.14975990693782, 1.08074439712469, 0.978248081241133, 0.755557633771936, 
0.823903684368671, 0.911855771375284, 0.954272320131035, 0.926253596526142, 
0.634052701587448, 0.0504659822408584, 0.712539957033173, 0.966034039620798, 
1.1901663169553, 0.81371119642719, 0.817417869881874, 0.737574872116582
)), row.names = c(NA, -23L), class = "data.frame"), structure(list(
    parlabel = c("Ties", "Home"), par = c("delta", "eta"), est = c(-43.5016417611571, 
    0.337872999554289), se = c(366043197.615422, 0.215169736220537
    )), row.names = c(NA, -2L), class = "data.frame"))

以下是数据框的外观:

head(cj1[[1]],2)
  individual id Ntot N1 ND N0    score propscore      theta  se.theta   outfit
1 a12TTT.pdf  1   13  5  0  8 5.069231 0.3899408 -0.5712111 0.6895501 1.249227
2    a15.pdf  2    9  5  0  4 4.966667 0.5518519  0.4187368 0.6894416 1.460678
     infit
1 1.407858
2 1.428545

我想创建一个单独的列表 -results1- 包含数据框,其中包含名为 individualtheta 的列 1 和 9 我试过了:

results1<-sapply(cj1, "[",c("individual","theta") )

[.data.frame(X[[i]], ...) 中的错误:选择了未定义的列

library(dplyr)
> results1 <- lapply(cj1, function(x) x%>% select(individual,theta))

错误:

不能对不存在的列进行子集化。
x 列 individual 不存在。
运行rlang::last_error() 看看哪里出错了。

我可以从一个数据框中减去这些列:

cj[[1]][c(1,9)]

我无法将此应用于整个列表。

【问题讨论】:

    标签: r


    【解决方案1】:

    您可以使用以下解决方案。我们使用.x 来指代您列表中的每个单独元素。这里.x 可以是您的每个数据框,我们只想选择其中的两列c("individual","theta")。 但是,由于只有一个数据框包含这样的列名,我使用 keep 函数实际上只保留数据框包含所需列名的元素。请记住这种形式的编码称为purrr-style 公式,我们需要~.x 之前。因此,您使用 map 函数,它等效于基础 R 中的 lapply,并使用此语法将任何函数应用于每个单独的元素(此处为数据框)。

    library(purrr)
    
    cj1 %>%
      map_if(~ all(c("individual","theta") %in% names(.x)), 
             ~ .x %>% select(individual, theta)) %>%
      keep(~ all(c("individual","theta") %in% names(.x)))
    
    [[1]]
          individual      theta
    1     a12TTT.pdf -0.5712111
    2        a15.pdf  0.4187368
    3        a17.pdf  0.4645337
    4        a18.pdf  0.7604320
    5        a21.pdf  1.4396103
    6      a2TTT.pdf  0.9359639
    7         a5.pdf  0.0742361
    8        B11.pdf  0.4167832
    9        B12.pdf  0.2325864
    10       B13.pdf  1.6534525
    11       B22.pdf  0.1789475
    12       B24.pdf  1.3980443
    13        B4.pdf  1.5300599
    14        B7.pdf  0.3400874
    15        B8.pdf  0.6169859
    16    cw10-1.pdf -1.7324610
    17    cw13-1.pdf -4.0618617
    18 cw15-1TTT.pdf -0.3477007
    19    cw17-1.pdf -1.2100996
    20      cw18.pdf  0.2391456
    21       cw3.pdf -1.8883642
    22       cw4.pdf -0.2764515
    23  cw7_1TTT.pdf -0.6114556
    

    或者我们可以节省一行代码更简洁:

    cj1 %>%
      keep(~ all(c("individual","theta") %in% names(.x))) %>%
      map(~ .x %>% select(individual, theta))
    
    [[1]]
          individual      theta
    1     a12TTT.pdf -0.5712111
    2        a15.pdf  0.4187368
    3        a17.pdf  0.4645337
    4        a18.pdf  0.7604320
    5        a21.pdf  1.4396103
    6      a2TTT.pdf  0.9359639
    7         a5.pdf  0.0742361
    8        B11.pdf  0.4167832
    9        B12.pdf  0.2325864
    10       B13.pdf  1.6534525
    11       B22.pdf  0.1789475
    12       B24.pdf  1.3980443
    13        B4.pdf  1.5300599
    14        B7.pdf  0.3400874
    15        B8.pdf  0.6169859
    16    cw10-1.pdf -1.7324610
    17    cw13-1.pdf -4.0618617
    18 cw15-1TTT.pdf -0.3477007
    19    cw17-1.pdf -1.2100996
    20      cw18.pdf  0.2391456
    21       cw3.pdf -1.8883642
    22       cw4.pdf -0.2764515
    23  cw7_1TTT.pdf -0.6114556
    

    这只是另一种基本 R 解决方案,其语法略有不同。请注意,\(x) 等同于 function(x),这是从 R. 4.1.0 开始提供的新功能。

    cj1 |>
      lapply(\(x) { 
        if(all(c("individual","theta") %in% names(x))) {
          `[`(x, c("individual","theta"))
        }
      }
    ) -> cj2
    
    cj2 <- cj2[-which(sapply(cj2, is.null))] |> as.data.frame()
    

    【讨论】:

    • Anoushiravan 我用 cj1 替换所有 .x 吗?
    • 不,您将 df 替换为 cj1 并使用第二种解决方案。还更新了我的解决方案。
    • 感谢使用了第一个解决方案(还没有看到您的评论)
    • 第二个更简洁一些。我还添加了一些额外的细节,如果有什么需要解释的,请告诉我。
    【解决方案2】:

    在基础 R 中,您可以使用 lapply 尝试此解决方案 -

    cols <- c("individual","theta")
    lapply(cj1, function(x) if(all(cols %in% names(x))) x[cols])
    
    #[[1]]
    #      individual   theta
    #1     a12TTT.pdf -0.5712
    #2        a15.pdf  0.4187
    #3        a17.pdf  0.4645
    #4        a18.pdf  0.7604
    #5        a21.pdf  1.4396
    #6      a2TTT.pdf  0.9360
    #7         a5.pdf  0.0742
    #8        B11.pdf  0.4168
    #9        B12.pdf  0.2326
    #10       B13.pdf  1.6535
    #11       B22.pdf  0.1789
    #12       B24.pdf  1.3980
    #13        B4.pdf  1.5301
    #14        B7.pdf  0.3401
    #15        B8.pdf  0.6170
    #16    cw10-1.pdf -1.7325
    #17    cw13-1.pdf -4.0619
    #18 cw15-1TTT.pdf -0.3477
    #19    cw17-1.pdf -1.2101
    #20      cw18.pdf  0.2391
    #21       cw3.pdf -1.8884
    #22       cw4.pdf -0.2765
    #23  cw7_1TTT.pdf -0.6115
    
    #[[2]]
    #NULL
    

    如果您想删除 NULL 列表,您可以添加 Filter -

    Filter(length, lapply(cj1, function(x) if(all(cols %in% names(x))) x[cols]))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-11-21
      • 2021-09-10
      • 2021-11-27
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多