【问题标题】:R: merge columns from same data.frame based on NA positionsR:根据 NA 位置合并来自同一 data.frame 的列
【发布时间】:2016-11-03 15:12:35
【问题描述】:

我有一个这样的数据框:

df <- data.frame(theme1=c("hello",NA,NA,NA), theme2=c(NA,"world",NA,NA), theme3=c(NA,NA,"good_morning",NA), theme4=c(NA,NA,NA,"good_evening"))

theme1 theme2 theme3 theme4 1 hello NA NA NA 2 NA world NA NA 3 NA NA good_morning NA 4 NA NA NA good_evening

现在我想获得一列并保留行顺序:

**Theme_merged** hello world good_morning good_evening

尝试:

merge_themes <- data.frame(cbind(mycol = na.omit(unlist(data2_tst[18:23]))), stringsAsFactors = F)

上面的代码有效,但不保留行顺序,所以当我想将向量放回原始数据帧时,它不再匹配。

真实数据:

dput(head(data2_tst[18:23], n = 50))
structure(list(Theme1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, "%Bedrukken%", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, "%Bedrukken%", NA, NA, NA, NA, NA, NA, NA, NA), Theme2 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%", 
"%Nieuwste|Nieuwe|201[6:7]%", "%Nieuwste|Nieuwe|201[6:7]%", NA, 
NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%", "%Nieuwste|Nieuwe|201[6:7]%", 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%", 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%", 
"%Nieuwste|Nieuwe|201[6:7]%"), Theme3 = c("%Nodig%", NA, "%Nodig%", 
"%Nodig%", "%Nodig%", NA, NA, "%Nodig%", NA, "%Nodig%", NA, NA, 
NA, NA, "%Nodig%", "%Nodig%", "%Nodig%", NA, NA, NA, NA, NA, 
NA, "%Nodig%", "%Nodig%", NA, NA, "%Nodig%", NA, "%Nodig%", "%Nodig%", 
"%Nodig%", NA, "%Nodig%", "%Nodig%", "%Nodig%", NA, NA, NA, "%Nodig%", 
"%Nodig%", NA, "%Nodig%", NA, "%Nodig%", "%Nodig%", NA, "%Nodig%", 
NA, NA), Theme4 = c(NA, "%Kopen%", NA, NA, NA, "%Kopen%", "%Kopen%", 
NA, "%Kopen%", NA, NA, NA, NA, NA, NA, NA, NA, "%Kopen%", "%Kopen%", 
NA, NA, "%Kopen%", "%Kopen%", NA, NA, "%Kopen%", "%Kopen%", NA, 
"%Kopen%", NA, NA, NA, NA, NA, NA, NA, "%Kopen%", "%Kopen%", 
"%Kopen%", NA, NA, NA, NA, "%Kopen%", NA, NA, "%Kopen%", NA, 
NA, NA), Theme5 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), Theme6 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_)), .Names = c("Theme1", 
"Theme2", "Theme3", "Theme4", "Theme5", "Theme6"), row.names = 3:52, class = "data.frame")

【问题讨论】:

  • @nrussell 你完全正确,对不起,这是一个错字!
  • (1) 请去掉cbind。 (2) 在哪些情况下na.omit(unlist(df, use.names = FALSE)) 不保留订单?我无法重现该问题。总是按预期为我工作。
  • 如果我使用真实数据,也许它会起作用。我会加广告的。所有这些都改变了我机器中按字母顺序排列的行顺序。

标签: r merge cbind


【解决方案1】:

dplyr 0.5.0 版本引入了合并功能:

这个版本的 dplyr 获得了许多受 SQL 启发的向量函数。有两个函数可以更轻松地消除或生成缺失值:

给定一组向量,coalesce() 在每个位置找到第一个非缺失值。

要将其应用于您可以使用的示例数据框:

df <- mutate_all(df, .funs = as.character)
df$merged <- with(df, coalesce(theme1, theme2, theme3, theme4))

我发现有必要从因子转换为字符以避免“无效因子水平”错误。

您的真实数据无需转换:

df$merged <- with(df, coalesce(Theme1, Theme2, Theme3, Theme4, Theme5, Theme6)

【讨论】:

    【解决方案2】:

    在 SQL 中,这将是 COALESCE 函数:

    apply(df, 1, function(r) c(na.omit(r), NA)[1])
    # [1] "hello"        "world"        "good_morning" "good_evening"
    

    df <- data.frame(
        theme1=c("hello",NA,NA,NA), 
        theme2=c(NA,"world",NA,NA), 
        theme3=c(NA,NA,"good_morning",NA), 
        theme4=c(NA,NA,NA,"good_evening"),
        stringsAsFactors = FALSE
    )
    

    在您的示例数据上na.omit(unlist(df2, use.names = FALSE)) 可以正常工作,但如果有一行 only NA 值,它将失败:

    df2 <- data.frame(
        theme1=c("hello",NA,NA,NA,NA), 
        theme2=c(NA,"world",NA,NA,NA), 
        theme3=c(NA,NA,"good_morning",NA,NA), 
        theme4=c(NA,NA,NA,"good_evening",NA),
        theme5=c(NA_character_,NA_character_,NA_character_,
                 NA_character_,NA_character_),
        stringsAsFactors = FALSE
    )
    
    df2$X <- na.omit(unlist(df2, use.names = FALSE))
    # Error in `$<-.data.frame`(`*tmp*`, "X", value = c("hello", "world", "good_morning",  : 
    #   replacement has 4 rows, data has 5
    
    df2$X <- apply(df2, 1, function(r) c(na.omit(r), NA)[1])
    #   theme1 theme2       theme3       theme4 theme5            X
    # 1  hello   <NA>         <NA>         <NA>   <NA>        hello
    # 2   <NA>  world         <NA>         <NA>   <NA>        world
    # 3   <NA>   <NA> good_morning         <NA>   <NA> good_morning
    # 4   <NA>   <NA>         <NA> good_evening   <NA> good_evening
    # 5   <NA>   <NA>         <NA>         <NA>   <NA>         <NA>
    

    另一个选项可能是df2$X &lt;- df2[cbind(1:nrow(df2), max.col(!is.na(df2)))]

    【讨论】:

    • @DavidArenburg 我想摆脱主题列,只创建一个合并列。
    • @DavidArenburg 当我想将向量放回它不再匹配的原始数据帧时,我使用了 "[...]。" 表示结果应该长度与输入数据中的行数相同。
    • 您的回答很好,非常感谢! @DavidArenburg
    【解决方案3】:

    这是一个 tidyverse 解决方案(使用 dplyrtidyr 或仅使用 tidyverse

    library(tidyverse)
    
    > df <- df %>% 
        gather("theme", "theme_merged", 1:4) %>%
        filter(!is.na(theme_merged)) %>% 
        select(theme_merged)
    
    > df
      theme_merged
    1        hello
    2        world
    3 good_morning
    4 good_evening
    

    【讨论】:

    • 感谢您的回答,但在我的真实数据中,订单并未保留。
    • 它是根据其他变量排序的吗?还是按字母顺序?还是它是有序级别的一个因素?
    • 应该保留订单,因为这只是我的大 DF 的一小部分。我需要将合并的列添加回 DF
    【解决方案4】:

    这应该适用于您的数据:

    new_df = c(as.matrix(df))
    

    此行首先将df 转换为矩阵,并将一个向量中的所有列与c() 绑定。

    new_df <- new_df[!is.na(new_df)]
    

    现在我们只保留非NA 条目。如果您愿意,可以将其转换回数据框:

    new_df <- data.frame(new_df);names(new_df) <- "Themes"
    

    【讨论】:

      猜你喜欢
      • 2016-08-28
      • 2017-05-01
      • 2012-07-21
      • 1970-01-01
      • 2012-02-13
      • 1970-01-01
      • 2013-12-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多