【问题标题】:Splitting one cell into multiple columns in R在R中将一个单元格拆分为多列
【发布时间】:2021-09-24 11:24:52
【问题描述】:

这就是我的专栏之一的样子:

Infos
NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA
NAME: ANDREW SURNAME: D'ONOFRIO AGE:47 CITY: NYC

我想创建四列:

NAME SURNAME AGE CITY
ANGELA SMITH 22 LA
ANDREW D'ONOFRIO 47 NYC

我读到我们可以使用 tidyverse 中的“分离”,这就是我尝试过的。

library(tidyr)
library(tidyverse)

df <- infos %>% separate(Infos, c("NAME", "SURNAME","AGE","CITY"))

但这是输出:

NAME SURNAME AGE CITY
NAME ANGELA SURNAME SMITH
NAME ANDREW SURNAME D'ONOFRIO

然后我想了解如何让 R 知道它必须分离什么。也许这个确切的主题之前已经在这里处理过(但我没有找到它)所以如果有必要请随时重定向我!

【问题讨论】:

    标签: r string tidyverse


    【解决方案1】:

    1) 提取 使用所示模式的提取。测试数据的字段内容中没有任何空格,但即使这样做也应该有效。

    library(dplyr)
    library(tidyr)
    
    pat <- "NAME: *(.*) SURNAME: *(.*) AGE: *(.*) CITY: *(.*)"
    dat %>% 
      extract(Infos, c("NAME", "SURNAME", "AGE", "CITY"), pat, convert = TRUE)
    ##     NAME   SURNAME AGE CITY
    ## 1 ANGELA     SMITH  22   LA
    ## 2 ANDREW D'ONOFRIO  47  NYC
    

    2) Base R 或者仅使用 base R,我们得到了这个通用解决方案,即使列数或其名称发生变化,它也将继续工作。如果字段内容中有空格,这也应该有效。它的工作原理是将 Infos 转换为 dcf 格式,然后是 read.dcf。

    dat |>
      with(gsub("(\\w+:)", "\n\\1", Infos)) |>
      textConnection() |>
      read.dcf() |>
      as.data.frame() |>
      type.convert(as.is = TRUE)
    ##     NAME   SURNAME AGE CITY
    ## 1 ANGELA     SMITH  22   LA
    ## 2 ANDREW D'ONOFRIO  47  NYC
    

    注意

    可复制形式的数据:

    dat <-
    structure(list(Infos = c("NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA", 
    "NAME: ANDREW SURNAME: D'ONOFRIO AGE:47 CITY: NYC")), class = "data.frame", row.names = c(NA, 
    -2L))
    

    【讨论】:

      【解决方案2】:

      您可以插入虚拟列,然后删除它们。

      tibble(dat=c("NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA", 
                   "NAME: ANDREW SURNAME: DONOFRIO AGE:47 CITY: NYC")) %>% 
          separate(dat, c("DEL1", "NAME", "DEL2", "SURNAME", "DEL3", "AGE", "DEL4", "CITY")) %>% 
          select(-DEL1, -DEL2, -DEL3, -DEL4)
      
       NAME   SURNAME  AGE   CITY 
       ANGELA SMITH    22    LA   
       ANDREW DONOFRIO 47    NYC  
      

      【讨论】:

        【解决方案3】:

        这是使用str_squishstr_replace_allseparate 的另一种解决方案

        library(dplyr)
        library(stringr)
        df %>% 
          mutate(Infos = str_squish(str_replace_all(Infos, ":", " "))) %>% 
          separate(Infos, c("helper1", "Name", "helper2", "Surname", "helper3", "Age", "helper4","City"), sep = " ") %>%
          select(-starts_with("helper"))
        

        输出:

          Name   Surname   Age   City 
          <chr>  <chr>     <chr> <chr>
        1 ANGELA SMITH     22    LA   
        2 ANDREW D'ONOFRIO 47    NYC  
        

        【讨论】:

          【解决方案4】:

          另一种策略

          df <- structure(list(Infos = c("NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA", 
                                     "NAME: ANDREW SURNAME: D'ONOFRIO AGE:47 CITY: NYC")), class = "data.frame", row.names = c(NA, 
                                                                                                                               -2L))
          library(tidyverse)
          
          df %>%
            mutate(Infos = gsub('\\:\\s*', ':', Infos)) %>%
            separate_rows(Infos, sep = '\\s') %>%
            separate(Infos, into = c('N', 'V'), sep = ':') %>%
            pivot_wider(names_from = N, values_from = V, values_fn = list) %>%
            unnest(everything())
          #> # A tibble: 2 x 4
          #>   NAME   SURNAME   AGE   CITY 
          #>   <chr>  <chr>     <chr> <chr>
          #> 1 ANGELA SMITH     22    LA   
          #> 2 ANDREW D'ONOFRIO 47    NYC
          

          reprex package (v2.0.0) 于 2021-07-15 创建

          【讨论】:

            【解决方案5】:

            使用strcapture 的基本 R 选项 -

            strcapture('NAME:\\s*(.*)\\s*SURNAME:\\s*(.*)\\s*AGE:\\s*(.*)\\s*CITY:\\s*(.*)', 
                       infos$Infos, proto = list(NAME = character(), 
                       SURNAME = character(), AGE = numeric(), CITY = character()))
            
            #    NAME    SURNAME  AGE CITY
            #1 ANGELA      SMITH   22   LA
            #2 ANDREW  D'ONOFRIO   47  NYC
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 1970-01-01
              • 2022-01-18
              • 1970-01-01
              • 2021-06-26
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              相关资源
              最近更新 更多