【问题标题】:Merge 2 variables vertically R tidyverse垂直合并2个变量R tidyverse
【发布时间】:2021-10-10 12:53:12
【问题描述】:

我用 2 种语言进行了一项调查,我想将两种语言的问题合并到一个变量中。

表格的答案都在同一个data.frame中。日期是我的主键。不幸的是,我还是 R 新手,无法找到如何优雅地结合这些。

现状示例

Date Place_English Plane_English Place_French Plane_French
One azea Three
Two ertert ertt

成为

Date Place Plane
One azea Three
Two ertert ertt

【问题讨论】:

  • 看看coalesce
  • 感谢您的建议! :) 我认为这不适用于我的情况,因为 1 种语言的问题集中存在一些缺失值。
  • 你是空白空字符串''还是NA
  • 空白处为 NA。
  • 这正是coalesce 做得很好的地方——它会忽略第一个 NA,直到找到一个非 NA 值并返回那个值。请参阅下面的示例。棘手的一点是有一些带有因素的警告。

标签: r tidy


【解决方案1】:

只是跟进我的评论,假设空值是 NA:

library(tidyverse)

创建数据:

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

使用 coalesce 将 NA 替换为第一个非 NA 值:

df %>% mutate(Plane = coalesce(plane_english, plane_french),
              Place = coalesce(place_english, place_french),
             )
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  place_english plane_english place_french plane_french Plane Place 
  <chr>         <chr>         <chr>        <chr>        <chr> <chr> 
1 NA            NA            azea         Three        Three azea  
2 ertert        ertt          NA           NA           ertt  ertert

您也可以使用例如,一次为一列实现相同的效果

df$Place <- coalesce(df$place_english, df$place_french)

【讨论】:

    【解决方案2】:

    这应该可以解决问题

    df %>%
      as_tibble() %>% 
      mutate_if(is.character, list(~na_if(.,""))) %>% #only needed if the missing fields are stored as blanks and not already NA
      transmute(
        Date,
        Place = coalesce(Place_English, Place_French),
        Plane = coalesce(Plane_English, Plane_French)
      )
    

    【讨论】:

      【解决方案3】:

      两种方法,都使用dplyr

      案例 1:如果存在 NA/缺失值

      df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
      One NA NA   azea    Three
      Two ertert  ertt    NA NA   ")
      
      library(dplyr)
      
      df %>%
        mutate(across(ends_with('_English'), ~ coalesce(., get(gsub('_English', '_French', cur_column()))),
                         .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
      #>   Date  Place Plane
      #> 1  One   azea Three
      #> 2  Two ertert  ertt
      

      case-2:如果有空字符串代替

      df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
      One '' ''   azea    Three
      Two ertert  ertt    ''  ''  ")
      library(tidyverse)
      
      df %>%
        mutate(across(ends_with('_English'), ~ paste0(., get(gsub('_English', '_French', cur_column()))),
                         .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
      #>   Date  Place Plane
      #> 1  One   azea Three
      #> 2  Two ertert  ertt
      

      【讨论】:

        【解决方案4】:

        如果有 >2 列并且您不想全部输入,您可以使用与 @coffeinjunky 相同的方法,但使用 across

        df <- data.frame(place_english = c(NA, "ertert"), 
                         plane_english = c(NA, "ertt"), 
                         place_french = c("azea", NA), 
                         plane_french=c("Three", NA),
                         stringsAsFactors = F)
        
        library(dplyr, warn.conflicts = FALSE)
        
        df %>% 
          transmute(place = do.call(coalesce, across(starts_with('place'))), 
                    plane = do.call(coalesce, across(starts_with('plane'))))
        #>    place plane
        #> 1   azea Three
        #> 2 ertert  ertt
        

        reprex package (v2.0.1) 于 2021-08-05 创建

        【讨论】:

          【解决方案5】:

          如果您不想丢失任何数据,请使用paste

          library(dplyr)
          df%>% mutate(Place = paste(Place_English, Place_French),
                       Plane = paste(Plane_English, Plane_French),
                       across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns 
          

          coalesce,如果你想摆脱NAs

          df%>% mutate(Place = coalesce(Place_English, Place_French),
                       Plane = coalesce(Plane_English, Plane_French),
                       across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns 
          

          如果您想组合超过 2 个列,请使用来自 tidyrunite。根据您的喜好设置na.rm

          library(tidyr)
          df %>% 
            unite("Place", colnames(df)[grepl(pattern = "Place", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) %>%  ## all cols including "Place" in name
            unite("Plane", colnames(df)[grepl(pattern = "Plane", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name
          
          library(tidyr)
          cols_to_paste <- colnames(df[,]) ## to choose only sepecified cols i.e. df[,15:25] or df[,c(15,18,20,25)]
          
          df %>% 
            unite('Place', cols_to_paste[grepl(pattern = 'Place', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) %>% ## all cols including "Place" in name
            unite('Plane', cols_to_paste[grepl(pattern = 'Plane', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name
          

          【讨论】:

          • 如果我不想丢失数据,有没有办法做到这一点而不必自己命名所有列?
          • 你指的是哪几列?
          • 抱歉不清楚,在这种情况下,我的调查有更多的列,而不仅仅是本例中的 2 个。例如,将 25:35 列粘贴到 15:25 列下
          • 已编辑。检查这是否是您正在寻找的。否则,让我们聊聊这个
          【解决方案6】:

          这是使用split.default 的基本 R 方法,它可以动态地用于任意数量的组。

          tmp <- df[-1]
          
          result <- cbind(df[1], sapply(split.default(tmp, sub('_.*', '', names(tmp))),
                          function(x) do.call(pmax, c(x, na.rm = TRUE))))
          
          result
          
          #  Date  Place Plane
          #1  One   azea Three
          #2  Two ertert  ertt
          

          【讨论】:

          • 奇怪的是split.default 没有任何文档!这么棒的功能。
          猜你喜欢
          • 2017-05-02
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多