垂直合并2个变量R tidyverse答案

【问题标题】：Merge 2 variables vertically R tidyverse垂直合并2个变量R tidyverse
【发布时间】：2021-10-10 12:53:12
【问题描述】：

我用 2 种语言进行了一项调查，我想将两种语言的问题合并到一个变量中。

表格的答案都在同一个data.frame中。日期是我的主键。不幸的是，我还是 R 新手，无法找到如何优雅地结合这些。

现状示例

Date	Place_English	Plane_English	Place_French	Plane_French
One			azea	Three
Two	ertert	ertt

成为

Date	Place	Plane
One	azea	Three
Two	ertert	ertt

【问题讨论】：

看看coalesce。
感谢您的建议！ :) 我认为这不适用于我的情况，因为 1 种语言的问题集中存在一些缺失值。
你是空白空字符串''还是NA？
空白处为 NA。
这正是coalesce 做得很好的地方——它会忽略第一个 NA，直到找到一个非 NA 值并返回那个值。请参阅下面的示例。棘手的一点是有一些带有因素的警告。

标签： r tidy

【解决方案1】：

只是跟进我的评论，假设空值是 NA：

library(tidyverse)

创建数据：

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

使用 coalesce 将 NA 替换为第一个非 NA 值：

df %>% mutate(Plane = coalesce(plane_english, plane_french),
              Place = coalesce(place_english, place_french),
             )
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  place_english plane_english place_french plane_french Plane Place 
  <chr>         <chr>         <chr>        <chr>        <chr> <chr> 
1 NA            NA            azea         Three        Three azea  
2 ertert        ertt          NA           NA           ertt  ertert

您也可以使用例如，一次为一列实现相同的效果

df$Place <- coalesce(df$place_english, df$place_french)

【讨论】：

【解决方案2】：

这应该可以解决问题

df %>%
  as_tibble() %>% 
  mutate_if(is.character, list(~na_if(.,""))) %>% #only needed if the missing fields are stored as blanks and not already NA
  transmute(
    Date,
    Place = coalesce(Place_English, Place_French),
    Plane = coalesce(Plane_English, Plane_French)
  )

【讨论】：

【解决方案3】：

两种方法，都使用dplyr

案例 1：如果存在 NA/缺失值

df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
One NA NA   azea    Three
Two ertert  ertt    NA NA   ")

library(dplyr)

df %>%
  mutate(across(ends_with('_English'), ~ coalesce(., get(gsub('_English', '_French', cur_column()))),
                   .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
#>   Date  Place Plane
#> 1  One   azea Three
#> 2  Two ertert  ertt

case-2：如果有空字符串代替

df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
One '' ''   azea    Three
Two ertert  ertt    ''  ''  ")
library(tidyverse)

df %>%
  mutate(across(ends_with('_English'), ~ paste0(., get(gsub('_English', '_French', cur_column()))),
                   .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
#>   Date  Place Plane
#> 1  One   azea Three
#> 2  Two ertert  ertt

【讨论】：

【解决方案4】：

如果有 >2 列并且您不想全部输入，您可以使用与 @coffeinjunky 相同的方法，但使用 across

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

library(dplyr, warn.conflicts = FALSE)

df %>% 
  transmute(place = do.call(coalesce, across(starts_with('place'))), 
            plane = do.call(coalesce, across(starts_with('plane'))))
#>    place plane
#> 1   azea Three
#> 2 ertert  ertt

^{由reprex package (v2.0.1) 于 2021-08-05 创建}

【讨论】：

【解决方案5】：

如果您不想丢失任何数据，请使用paste

library(dplyr)
df%>% mutate(Place = paste(Place_English, Place_French),
             Plane = paste(Plane_English, Plane_French),
             across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns

或coalesce，如果你想摆脱NAs

df%>% mutate(Place = coalesce(Place_English, Place_French),
             Plane = coalesce(Plane_English, Plane_French),
             across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns

如果您想组合超过 2 个列，请使用来自 tidyr 的 unite。根据您的喜好设置na.rm

library(tidyr)
df %>% 
  unite("Place", colnames(df)[grepl(pattern = "Place", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) %>%  ## all cols including "Place" in name
  unite("Plane", colnames(df)[grepl(pattern = "Plane", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name

library(tidyr)
cols_to_paste <- colnames(df[,]) ## to choose only sepecified cols i.e. df[,15:25] or df[,c(15,18,20,25)]

df %>% 
  unite('Place', cols_to_paste[grepl(pattern = 'Place', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) %>% ## all cols including "Place" in name
  unite('Plane', cols_to_paste[grepl(pattern = 'Plane', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name

【讨论】：

如果我不想丢失数据，有没有办法做到这一点而不必自己命名所有列？
你指的是哪几列？
抱歉不清楚，在这种情况下，我的调查有更多的列，而不仅仅是本例中的 2 个。例如，将 25:35 列粘贴到 15:25 列下
已编辑。检查这是否是您正在寻找的。否则，让我们聊聊这个

【解决方案6】：

这是使用split.default 的基本 R 方法，它可以动态地用于任意数量的组。

tmp <- df[-1]

result <- cbind(df[1], sapply(split.default(tmp, sub('_.*', '', names(tmp))),
                function(x) do.call(pmax, c(x, na.rm = TRUE))))

result

#  Date  Place Plane
#1  One   azea Three
#2  Two ertert  ertt

【讨论】：

奇怪的是split.default 没有任何文档！这么棒的功能。