【问题标题】:Convert columns to list将列转换为列表
【发布时间】:2020-07-17 03:28:47
【问题描述】:

我在 2 列(Region 和 GroupedCateg)中有数据。请参阅下面的数据框。我想将其转换为嵌套列表。我尝试使用 dplyr 的 group_by 和 do() 函数,然后转换为列表,但它不起作用。

df <- read.table(header = T,
                 text = '
Region  GroupedCateg
Beja    Alentejo
Evora   Alentejo
Portalegre  Alentejo
Faro    Algarve
Aveiro  Central
"Castelo Branco"    Central
Coimbra Central
Leiria  Central
Santarem    Central
Acores  Islands
Madeira Islands
Lisboa  Lisbon
Setubal Lisbon
Braga   North
Braganca    North
"Viana do Castelo"  North
"Vila Real" North
Porto   Porto
')

列表中的所需输出。地区将在名称中。嵌套列表中的各个 GroupedCateg

list(
  list(
    name =  "Alentejo",
    categories = list("Beja", "Evora", "Portalegre")
  ),
  list(
    name =  "Algarve",
    categories = list("Faro")
  ),
  list(
    name =  "Central",
    categories = list("Aveiro", "Castelo Branco", "Coimbra", "Leiria", "Santarem" )
  ),
  
  list(
    name =  "North",
    categories = list("Braga", "Braganca", "Viana do Castelo", "Vila Real")
  ),
  list(
    name =  "Lisbon",
    categories = list("Lisboa", "Setubal")
  ),
  list(
    name =  "Islands",
    categories = list("Acores", "Madeira")
  ),
  
  list(
    name =  "Porto",
    categories = list("Porto")
  )
  
)

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    您可以在purrr 中使用pmap()

    library(dplyr)
    library(purrr)
    
    x <- df %>%
      group_by(GroupedCateg) %>% 
      summarise(Region = list(Region)) %>%
      pmap(~ list(name = .x, categories = as.list(.y)))
    

    对应的baseR版本:

    y <- apply(aggregate(Region ~ GroupedCateg, df, c),
         1, function(y) list(name = y[[1]], categories = as.list(y[[2]])))
    

    all.equal(x, y)
    # [1] TRUE
    

    输出

    [[1]]
    [[1]]$name
    [1] "Alentejo"
    
    [[1]]$categories
    [[1]]$categories[[1]]
    [1] "Beja"
    
    [[1]]$categories[[2]]
    [1] "Evora"
    
    [[1]]$categories[[3]]
    [1] "Portalegre"
    
    
    
    [[2]]
    [[2]]$name
    [1] "Algarve"
    
    [[2]]$categories
    [[2]]$categories[[1]]
    [1] "Faro"
    
    
    
    [[3]]
    [[3]]$name
    [1] "Central"
    
    [[3]]$categories
    [[3]]$categories[[1]]
    [1] "Aveiro"
    
    [[3]]$categories[[2]]
    [1] "Castelo Branco"
    
    [[3]]$categories[[3]]
    [1] "Coimbra"
    
    [[3]]$categories[[4]]
    [1] "Leiria"
    
    [[3]]$categories[[5]]
    [1] "Santarem"
    
    
    
    [[4]]
    [[4]]$name
    [1] "Islands"
    
    [[4]]$categories
    [[4]]$categories[[1]]
    [1] "Acores"
    
    [[4]]$categories[[2]]
    [1] "Madeira"
    
    
    
    [[5]]
    [[5]]$name
    [1] "Lisbon"
    
    [[5]]$categories
    [[5]]$categories[[1]]
    [1] "Lisboa"
    
    [[5]]$categories[[2]]
    [1] "Setubal"
    
    
    
    [[6]]
    [[6]]$name
    [1] "North"
    
    [[6]]$categories
    [[6]]$categories[[1]]
    [1] "Braga"
    
    [[6]]$categories[[2]]
    [1] "Braganca"
    
    [[6]]$categories[[3]]
    [1] "Viana do Castelo"
    
    [[6]]$categories[[4]]
    [1] "Vila Real"
    
    
    
    [[7]]
    [[7]]$name
    [1] "Porto"
    
    [[7]]$categories
    [[7]]$categories[[1]]
    [1] "Porto"
    

    【讨论】:

    • 干得好,我觉得我正在与那里的列表作斗争。很高兴你能让我摆脱痛苦;)
    • 谢谢达伦,它提供了很多信息。您在aggregate 中添加unique 有什么原因吗?
    • @DanielO 我只是想每个GroupedCateg 中可能有重复的Region。现在我认为unique() 可能是多余的。
    【解决方案2】:

    在列上使用基础 split,然后 lapply 根据需要重新格式化:

    x <- split(df$Region, df$GroupedCateg)
    
    res <- lapply(names(x), function(i){
      list(name = i,
           categories = as.list(x[[ i ]]))
    })
    

    【讨论】:

      【解决方案3】:

      在 Base-R 中

      apply(aggregate(Region~GroupedCateg,df,c),1, function(x) list(name=x[1], category=as.list(x[2]$Region))) 
          
      [[1]]
      [[1]]$name
      [[1]]$name$GroupedCateg
      [1] "Alentejo"
      
      
      [[1]]$category
      [[1]]$category[[1]]
      [1] "Beja"
      
      [[1]]$category[[2]]
      [1] "Evora"
      
      [[1]]$category[[3]]
      [1] "Portalegre"
      
      
      
      [[2]]
      [[2]]$name
      [[2]]$name$GroupedCateg
      [1] "Algarve"
      
      
      [[2]]$category
      [[2]]$category[[1]]
      [1] "Faro"
      
      
      
      [[3]]
      [[3]]$name
      [[3]]$name$GroupedCateg
      [1] "Central"
      
      
      [[3]]$category
      [[3]]$category[[1]]
      [1] "Aveiro"
      
      [[3]]$category[[2]]
      [1] "Castelo Branco"
      
      [[3]]$category[[3]]
      [1] "Coimbra"
      
      [[3]]$category[[4]]
      [1] "Leiria"
      
      [[3]]$category[[5]]
      [1] "Santarem"
      
      
      
      [[4]]
      [[4]]$name
      [[4]]$name$GroupedCateg
      [1] "Islands"
      
      
      [[4]]$category
      [[4]]$category[[1]]
      [1] "Acores"
      
      [[4]]$category[[2]]
      [1] "Madeira"
      
      
      
      [[5]]
      [[5]]$name
      [[5]]$name$GroupedCateg
      [1] "Lisbon"
      
      
      [[5]]$category
      [[5]]$category[[1]]
      [1] "Lisboa"
      
      [[5]]$category[[2]]
      [1] "Setubal"
      
      
      
      [[6]]
      [[6]]$name
      [[6]]$name$GroupedCateg
      [1] "North"
      
      
      [[6]]$category
      [[6]]$category[[1]]
      [1] "Braga"
      
      [[6]]$category[[2]]
      [1] "Braganca"
      
      [[6]]$category[[3]]
      [1] "Viana do Castelo"
      
      [[6]]$category[[4]]
      [1] "Vila Real"
      
      
      
      [[7]]
      [[7]]$name
      [[7]]$name$GroupedCateg
      [1] "Porto"
      
      
      [[7]]$category
      [[7]]$category[[1]]
      [1] "Porto"
      

      【讨论】:

      • 谢谢。但我需要完全相同的输出,因为我需要将它传递给 highcharts 函数之一。如果不在同一个结构中,它将无法工作。
      猜你喜欢
      • 1970-01-01
      • 2016-11-30
      • 1970-01-01
      • 1970-01-01
      • 2019-10-21
      • 1970-01-01
      • 2017-04-03
      相关资源
      最近更新 更多