【问题标题】:Combining Multiple Columns with Tidyr's Unite by Referencing Similar Column Names通过引用相似的列名将多个列与 Tidyr 的 Unite 组合
【发布时间】:2017-07-28 11:15:01
【问题描述】:
library(tidyr)
library(dplyr)
library(tidyverse)

下面是一个简单数据框的代码。我有一些杂乱的数据被导出,列因子类别分布在不同的列中。

Client<-c("Client1","Client2","Client3","Client4","Client5")
Sex_M<-c("Male","NA","Male","NA","Male")
Sex_F<-c(" ","Female"," ","Female"," ")
Satisfaction_Satisfied<-c("Satisfied"," "," ","Satisfied","Satisfied")
Satisfaction_VerySatisfied<-c(" ","VerySatisfied","VerySatisfied"," "," ")
CommunicationType_Email<-c("Email"," "," ","Email","Email")
CommunicationType_Phone<-c(" ","Phone ","Phone "," "," ")
DF<-tibble(Client,Sex_M,Sex_F,Satisfaction_Satisfied,Satisfaction_VerySatisfied,CommunicationType_Email,CommunicationType_Phone)

我想使用 tidyr 的“联合”将类别重新组合成单​​列。

DF<-DF%>%unite(Sat,Satisfaction_Satisfied,Satisfaction_VerySatisfied,sep=" ")%>%
unite(Sex,Sex_M,Sex_F,sep=" ")

但是,我必须编写多个“合并”行,我觉得这违反了三倍规则,所以必须有一种方法可以使这更容易,特别是因为我的真实数据包含需要合并的数十列。有没有办法使用“unite”一次但以某种方式引用匹配的列名,以便所有相似的列名(例如,“Sex_M”和“Sex_F”包含“Sex”,“CommunicationType_Email”包含“CommunicationType”和“CommunicationType_Phone”)与上述公式相结合?

我也在考虑一个允许我输入列名的函数,但这对我来说太难了,因为它涉及复杂的标准评估。

【问题讨论】:

  • DF %&gt;% unite(Sat, contains("Sat"))?
  • DF %&gt;% unite(Sat, matches("^Sat"))

标签: r tidyr tidyverse


【解决方案1】:

这样的?如果你有很多列。

result<-with(new.env(),{
  Client<-c("Client1","Client2","Client3","Client4","Client5")
  Sex_M<-c("Male","NA","Male","NA","Male")
  Sex_F<-c(" ","Female"," ","Female"," ")
  Satisfaction_Satisfied<-c("Satisfied"," "," ","Satisfied","Satisfied")
  Satisfaction_VerySatisfied<-c(" ","VerySatisfied","VerySatisfied"," "," ")
  CommunicationType_Email<-c("Email"," "," ","Email","Email")
  CommunicationType_Phone<-c(" ","Phone ","Phone "," "," ")
  x<-ls()
  categories<-unique(sub("(.*)_(.*)", "\\1", x))
  df<-setNames(data.frame( lapply(x, function(y) get(y))), x)
  for(nm in categories){
    df<-unite_(df, nm, x[contains(vars = x, match = nm)])
  }
  return(df)
})

Client CommunicationType    Satisfaction       Sex
1 Client1           Email_      Satisfied_      _Male
2 Client2           _Phone   _VerySatisfied Female_NA
3 Client3           _Phone   _VerySatisfied     _Male
4 Client4           Email_      Satisfied_  Female_NA
5 Client5           Email_      Satisfied_      _Male

【讨论】:

    【解决方案2】:

    我们可以使用unite

    library(tidyverse)
    DF %>% 
        unite(Sat, matches("^Sat"))
    

    对于多种情况,也许

    gather(DF, Var, Val, -Client, na.rm = TRUE) %>%
            separate(Var, into = c("Var1", "Var2")) %>%
            group_by(Client, Var1) %>% 
            summarise(Val = paste(Val[!(is.na(Val)|Val=="")], collapse="_")) %>%
            spread(Var1, Val)
    #  Client CommunicationType  Satisfaction    Sex
    #*   <chr>             <chr>         <chr>  <chr>
    #1 Client1             Email     Satisfied   Male
    #2 Client2             Phone VerySatisfied Female
    #3 Client3             Phone VerySatisfied   Male
    #4 Client4             Email     Satisfied Female
    #5 Client5             Email     Satisfied   Male
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-06-30
      • 2016-09-29
      • 1970-01-01
      • 2017-03-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多