【问题标题】:How to convert list of characters to columns如何将字符列表转换为列
【发布时间】:2019-10-12 09:01:43
【问题描述】:

我有一个数据框,C列是字符

df <- data.frame(A = c(13, 15, 17), B = c("yes", "no", "yes"), C = c("Mon, Thu, Sun", "Thu, Tue, Fri", "Sat, Mon, Wen"))

    A   B   C
1   13  yes Mon, Thu, Sun
2   15  no  Thu, Tue, Fri
3   17  yes Sat, Mon, Wen

如何将 data.frame 列 C 转换为:

    A   B   Sun Mon Tue Wen Thu Fri Sat
1   13  yes 1   1   0   0   1   0   0
2   15  no  0   0   1   0   1   1   0
3   17  yes 0   1   0   1   0   0   1

【问题讨论】:

    标签: r data-manipulation


    【解决方案1】:

    使用separate_rows将其转换为长格式,添加一个值列并将C转换为因子,然后将其展开回宽格式。

    library(dplyr)
    library(tidyr)
    
    days.abb <- c("Sun", "Mon", "Tue", "Wen", "Thu", "Fri", "Sat")
    df %>%
      separate_rows(C) %>%
      mutate(value = 1, C = factor(C, days.abb)) %>%
      spread(C, value, fill = 0)
    

    给予:

       A   B Sun Mon Tue Wen Thu Fri Sat
    1 13 yes   1   1   0   0   1   0   0
    2 15  no   0   0   1   0   1   1   0
    3 17 yes   0   1   0   1   0   0   1
    

    【讨论】:

    • 谢谢,我也会试试你的解决方案。也许我需要正确的日子顺序。
    【解决方案2】:

    dplyrtidyr 选项可以是:

    df %>%
     mutate(C = strsplit(as.character(C), ", ", fixed = TRUE)) %>%
     unnest(C) %>%
     mutate(C_val = 1) %>%
     pivot_wider(names_from = C, values_from = C_val, values_fill = list(C_val = 0))
    
          A B       Mon   Thu   Sun   Tue   Fri   Sat   Wen
      <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1    13 yes       1     1     1     0     0     0     0
    2    15 no        0     1     0     1     1     0     0
    3    17 yes       1     0     0     0     0     1     1
    

    或者:

    df %>%
     separate_rows(C) %>%
     mutate(C_val = 1) %>%
     pivot_wider(names_from = C, values_from = C_val, values_fill = list(C_val = 0))
    

    【讨论】:

      【解决方案3】:

      一种基于 R 的方法:

      1. 应用于C 列,将",\\s"(逗号后跟一个空格)替换为"|"
      2. 使用grepl 和生成的正则表达式来检查days.abb 中存在哪些日期
      3. 将生成的二进制向量按行和cbind 组合到现有的AB
      ## data
      df <- data.frame(
          A = c(13, 15, 17), 
          B = c("yes", "no", "yes"), 
          C = c("Mon, Thu, Sun", "Thu, Tue, Fri", "Sat, Mon, Wen")
      )
      
      ## abbreviated weekdays
      days.abb <- c("Sun", "Mon", "Tue", "Wen", "Thu", "Fri", "Sat")
      
      ## find weekday indices for each character in column C
      df1 <- cbind(df[, -3], t(sapply(df[, 3], function(x) 1 * grepl(gsub(",\\s", "|", x), days.abb))))
      
      ## update column names
      setNames(df1, c("A", "B", days.abb)) 
      #>    A   B Sun Mon Tue Wen Thu Fri Sat
      #> 1 13 yes   1   1   0   0   1   0   0
      #> 2 15  no   0   0   1   0   1   1   0
      #> 3 17 yes   0   1   0   1   0   0   1
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-09-27
        • 1970-01-01
        • 1970-01-01
        • 2018-05-29
        • 1970-01-01
        相关资源
        最近更新 更多