【问题标题】:Creating a new variable using dplyr in R using lists使用列表在 R 中使用 dplyr 创建新变量
【发布时间】:2020-06-24 05:12:44
【问题描述】:

假设我们有三个列表。

list_A <- c("PA","MA","MD")
list_B <- c("NJ","NY","OK")
list_C <- c("AZ","MT","LA")

我有一个这样的数据框

ID        presenter          state   
1         Donatello   c("AZ","NY")
2          Leonardo             NJ
3            Rafael   c("LA","MT")
4     Michaelangelo    c("PA,"LA")

我想使用 tidyverse 创建三个新变量,计算 state 的每行包含其列表元素的次数。

ID        presenter          state     A   B   C   
1         Donatello   c("AZ","NY")     0   1   1
2          Leonardo             NJ     0   1   0
3            Rafael   c("LA","MT")     0   0   2
4     Michaelangelo   c("PA","LA")     1   0   1

不相关但相关,但出于好奇,是否可以 unlist() 状态但使用旧信息创建更多行,因此操纵 df1...

ID        presenter          state   
1         Donatello             AZ
1         Donatello             NY
2          Leonardo             NJ
3            Rafael             LA
3            Rafael             MT
4     Michaelangelo             PA
4     Michaelangelo             LA

【问题讨论】:

    标签: r if-statement tidyverse dplyr


    【解决方案1】:

    你可以使用双sapply

    list_data <- list(list_A, list_B, list_C)
    cbind(df1, data.frame(sapply(list_data, function(x) 
               sapply(df1$state, function(y) sum(y %in% x)))))
    
    
    #     ID presenter     state        X1    X2    X3
    #  <int> <chr>         <list>    <int> <int> <int>
    #1     1 Donatello     <chr [2]>     0     1     1
    #2     2 Leonardo      <chr [1]>     0     1     0
    #3     3 Rafael        <chr [2]>     0     0     2
    #4     4 Michaelangelo <chr [2]>     1     0     1
    

    【讨论】:

      【解决方案2】:

      关于您的第一个问题如何:

      library(dplyr)
      library(tidyr)
      library(tibble)
      
      list_A <- c("PA","MA","MD")
      list_B <- c("NJ","NY","OK")
      list_C <- c("AZ","MT","LA")
      
      
      data <- tibble(
        ID = c(1, 2, 3, 4),
        presenter = c("Donatello", "Leonardo", "Rafael", "Michaelangelo"),
        state = list(c("AZ", "NJ"), c("NJ"), c("LA", "MT"), c("PA", "LA"))
      )
      
      data <- data %>%
        rowwise() %>%
        mutate(A = sum(list_A %in% state),
               B = sum(list_B %in% state),
               C = sum(list_C %in% state))
      

      有了这个输出:

      > data
      Source: local data frame [4 x 6]
      Groups: <by row>
      
      # A tibble: 4 x 6
           ID presenter     state         A     B     C
        <dbl> <chr>         <list>    <int> <int> <int>
      1     1 Donatello     <chr [2]>     0     1     1
      2     2 Leonardo      <chr [1]>     0     1     0
      3     3 Rafael        <chr [2]>     0     0     2
      4     4 Michaelangelo <chr [2]>     1     0     1
      

      更新 至于你的第二个问题:tidyrunnest 函数会做到这一点。

      > data %>%
      +   unnest(state)
      # A tibble: 7 x 3
           ID presenter     state
        <dbl> <chr>         <chr>
      1     1 Donatello     AZ   
      2     1 Donatello     NJ   
      3     2 Leonardo      NJ   
      4     3 Rafael        LA   
      5     3 Rafael        MT   
      6     4 Michaelangelo PA   
      7     4 Michaelangelo LA 
      

      UDPATE2

      要计算 state 列中的多次出现,您需要一个额外的循环。这可能会做到(但建议您对其进行测试):

      data <- data %>%
        rowwise() %>%
        mutate(A = sum(unlist(lapply(list_A, function(x) sum(x == state)))),
               B = sum(unlist(lapply(list_B, function(x) sum(x == state)))),
               C = sum(unlist(lapply(list_C, function(x) sum(x == state)))))
      

      对于这个数据:

      data <- tibble(
        ID = c(1, 2, 3, 4),
        presenter = c("Donatello", "Leonardo", "Rafael", "Michaelangelo"),
        state = list(c("AZ", "NJ"), c("NJ"), c("LA", "MT", "MT", "MT"), c("PA", "PA", "LA")) 
      )
      

      我们预计 C 列和 A 列的第三行 (3x "MT") 和第四行 (2x "PA") 会有额外的计数:

      > data
      Source: local data frame [4 x 6]
      Groups: <by row>
      
      # A tibble: 4 x 6
           ID presenter     state         A     B     C
        <dbl> <chr>         <list>    <int> <int> <int>
      1     1 Donatello     <chr [2]>     0     1     1
      2     2 Leonardo      <chr [1]>     0     1     0
      3     3 Rafael        <chr [4]>     0     0     4
      4     4 Michaelangelo <chr [3]>     2     0     1
      

      【讨论】:

      • 有一个案例我没有提到,如果单元格中有多个相同的状态,它也必须计数;所以 c("PA" , "PA) 将是 A = 2。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-07-21
      • 1970-01-01
      • 2022-10-15
      • 2019-08-12
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多