【问题标题】:Split dataset based on column with a loop使用循环根据列拆分数据集
【发布时间】:2021-03-28 23:43:20
【问题描述】:

我一直在尝试获得一个循环,该循环根据列值将数据集拆分为多个数据集。但是,数据集是我以前从未处理过的格式(即包含列表和 data.tables 的列表)。数据集可通过以下方式重现:

table1 <- data.table::data.table(Scenario = 
                            c(rep(
                              c("A", "B", "C", "D"), 
                              4)),
                          A = c(
                            rep("x", 4), rep("b", 4), rep("s", 4),
                            rep("u", 4)),
                          Correlation = c(1, 0.125, 0.1, 0, 
                                          0.125, 1, 0.2, 0, 
                                          0.1, 0.2,   1, 0, 
                                          0,     0,   0, 1),
                          Matrix = "IM",
                          stringsAsFactors = FALSE,
                          check.names = FALSE)
table2 <- data.table::data.table(Scenario = 
                         c(rep(
                           c("A", "B", "C", "D"), 
                           4)),
                       A = c(
                         rep("x", 4), rep("b", 4), rep("s", 4),
                         rep("u", 4)),
                       Correlation = c(1, 0.125, 0.1, 0, 
                                       0.125, 1, 0.2, 0, 
                                       0.1, 0.2,   1, 0, 
                                       0,     0,   0, 1),
                       Matrix = "IM",
                       stringsAsFactors = FALSE,
                       check.names = FALSE)

table3 <- data.table::data.table(Scenario = 
                         c(rep(
                           c("A", "B", "C", "D"), 
                           4)),
                       A = c(
                         rep("x", 4), rep("b", 4), rep("s", 4),
                         rep("u", 4)),
                       Correlation = c(1, 0.125, 0.1, 0, 
                                       0.125, 1, 0.2, 0, 
                                       0.1, 0.2,   1, 0, 
                                       0,     0,   0, 1),
                       Matrix = "IM",
                       stringsAsFactors = FALSE,
                       check.names = FALSE)

list1 <- list("a" = "2019", "b" = "2020", "c" = "2021")
list2 <- list("a" = "test", "b" = "test", "c" = "test")

input_data <- list("table1" = table1, "table2" = table2, "table3" = table3, 
"list1"=list1, "list2" = list2)

我需要一个循环来根据场景列中的所有唯一实例拆分此数据集。第一个数据集(对于场景值“A”)可通过以下方式重现:

table1 <- data.table::data.table(Scenario = 
                               c(rep(
                                 c("A"), 
                                 4)),
                             A = c(
                               rep("x", 1), rep("b", 1), rep("s", 1),
                               rep("u", 1)),
                             Correlation = c(1, 0.125, 0.1, 0 ),
                             Matrix = "IM",
                             stringsAsFactors = FALSE,
                             check.names = FALSE)
table2 <- data.table::data.table(Scenario = 
                               c(rep(
                                 c( "A"), 
                                 4)),
                             A = c(
                               rep("x", 1), rep("b", 1), rep("s", 1),
                               rep("u", 1)),
                             Correlation = c(1, 0.125, 0.1, 0),
                             Matrix = "IM",
                             stringsAsFactors = FALSE,
                             check.names = FALSE)

table3 <- data.table::data.table(Scenario = 
                               c(rep(
                                 c("A"), 
                                 4)),
                             A = c(
                               rep("x", 1), rep("b", 1), rep("s", 1),
                               rep("u", 1)),
                             Correlation = c(1, 0.125, 0.1, 0),
                             Matrix = "IM",
                             stringsAsFactors = FALSE,
                             check.names = FALSE)

list1 <- list("a" = "2019", "b" = "2020", "c" = "2021")
list2 <- list("a" = "test", "b" = "test", "c" = "test")

input_data <- list("table1" = table1, "table2" = table2, "table3" = table3, 
               "list1"=list1, "list2" = list2)

如果需要更多信息,请告诉我。

【问题讨论】:

    标签: r loops data.table


    【解决方案1】:

    您可以编写一个包装lapply 的函数,利用inherits 检查列表中每个对象的类型。如果该对象继承自 data.frame 并包含一个名为 Scenario 的列,那么您可以简单地对其进行子集化。不是数据框或数据表的项目,或者没有称为Scenario 的列的项目保持不变:

    get_scenario <- function(S) {
      lapply(input_data, function(x) {
        if(!inherits(x, "data.frame")) 
          return(x) 
        else if(!"Scenario" %in% names(x))
          return(x)
        
        return(x[x$Scenario == S,])
        })
    }
    

    这允许:

    get_scenario("A")
    #> $table1
    #>    Scenario A Correlation Matrix
    #> 1:        A x       1.000     IM
    #> 2:        A b       0.125     IM
    #> 3:        A s       0.100     IM
    #> 4:        A u       0.000     IM
    #> 
    #> $table2
    #>    Scenario A Correlation Matrix
    #> 1:        A x       1.000     IM
    #> 2:        A b       0.125     IM
    #> 3:        A s       0.100     IM
    #> 4:        A u       0.000     IM
    #> 
    #> $table3
    #>    Scenario A Correlation Matrix
    #> 1:        A x       1.000     IM
    #> 2:        A b       0.125     IM
    #> 3:        A s       0.100     IM
    #> 4:        A u       0.000     IM
    #> 
    #> $list1
    #> $list1$a
    #> [1] "2019"
    #> 
    #> $list1$b
    #> [1] "2020"
    #> 
    #> $list1$c
    #> [1] "2021"
    #> 
    #> 
    #> $list2
    #> $list2$a
    #> [1] "test"
    #> 
    #> $list2$b
    #> [1] "test"
    #> 
    #> $list2$c
    #> [1] "test"
    

    如果您希望所有子组都成为一个超级列表,您可以这样做:

    lapply(c("A", "B", "C"), get_scenario)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-02-04
      • 2019-05-06
      • 1970-01-01
      • 2017-07-27
      • 1970-01-01
      • 2018-10-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多