【问题标题】:How to merge three data frames using two variables based on conditions in R如何根据R中的条件使用两个变量合并三个数据框
【发布时间】:2021-02-04 11:39:57
【问题描述】:

R 用户, 我想将来自三个不同数据帧(studentsPublic、studentsPrivate、studentsState)的数据合并到一个名为 Final_Desired_df 的数据帧中。 Final_Desired_df。我想使用学生的电子邮件地址或他们的社会安全号码 (ssn)。下面的示例说明了我需要什么以及对 Final_Desired_df 的描述。提前感谢您的帮助。

studentsPublic = randomNames::randomNames(10)
emailPublic = c('a@usa.com', NA, 'b@usa.com', 'c@usa.com', 'd@usa.com',NA, NA, 'e@usa.com', 'f@usa.com', 'g@usa.com')
examPublic = rnorm(10, mean=15, sd=5)
d1_PublicSchool = data.frame(studentsPublic, emailPublic, examPublic)

studentsPrivate = randomNames::randomNames(10)
emailPivate = c('t@usa.com', NA, NA, NA, 'd@usa.com',NA, NA, 'e@usa.com', 'f@usa.com', NA)
ssnPrivate = c(NA, 12, 34, NA,45, 67, NA, 32, 23, NA )
exanPrivate = rnorm(10, mean=15, sd=5)
d2_PrivateSchool = data.frame(studentsPrivate, emailPivate, ssnPrivate, exanPrivate)

studentsState = randomNames::randomNames(30)
emailState = c('a@usa.com', NA, 'b@usa.com', 'c@usa.com', 'd@usa.com',NA, NA, 'e@usa.com', 'f@usa.com', 'g@usa.com')
ssnState = c(NA, 12, 34, NA,45, 67, NA, 32, 23, NA)
sexState = rep(c('male', 'female'), 15,15)
d3_StateSchools = data.frame(studentsState, emailState, ssnState, sexState)

Final_Desired_df = 应包括所有来自 d1_PublicSchool 且电子邮件地址位于 d3_StateSchools 中的学生;以及来自 d2_PrivateSchool 的所有学生,他们的 emailPivate 在 d3_StateSchools 或他们的 ssnPrivate 在 d3_StateSchools。

提前致谢。

【问题讨论】:

    标签: r statistics data-analysis


    【解决方案1】:

    这个怎么样?我不得不重命名列以连接到最终数据框,并添加了删除重复行的最后一步。

    # all students from d1_PublicSchool whose email addresses are in the d3_StateSchools
    students_from_d1_in_d3_email<-d1_PublicSchool[which(d1_PublicSchool$emailPublic %in% d3_StateSchools$emailState),]
    
    # add missing column of ssn as NAs
    students_from_d1_in_d3_email<-cbind(students_from_d1_in_d3_email$studentsPublic,students_from_d1_in_d3_email$emailPublic,"ssn"=NA,students_from_d1_in_d3_email$examPublic)
    
    # adjust column names to match
    colnames(students_from_d1_in_d3_email)<-c("name","email","ssn","exam")
    
    # all students from d2_PrivateSchool whose emailPivate are in the d3_StateSchools
    students_from_d2_in_d3_email<-d2_PrivateSchool[which(d2_PrivateSchool$emailPivate %in% d3_StateSchools$emailState),]
    
    # adjust column names to match
    colnames(students_from_d2_in_d3_email)<-c("name","email","ssn","exam")
    
    # all students from d2_PrivateSchool whose ssnPrivate are in the d3_StateSchools
    students_from_d2_in_d3_SSN<-d2_PrivateSchool[which(d2_PrivateSchool$ssnPrivate %in% d3_StateSchools$ssnState),]
    
    # adjust column names to match
    colnames(students_from_d2_in_d3_SSN)<-c("name","email","ssn","exam")
    
    # Final dataframe
    Final_Desired_df<-rbind(students_from_d1_in_d3_email,students_from_d2_in_d3_email,students_from_d2_in_d3_SSN)
    
    
    # Remove duplicate students in final dataframe
    Final_Desired_df<-unique(Final_Desired_df)
    

    【讨论】:

    • 非常感谢@dmuenzel。如果您不介意,给您或其他帮助者的另一个问题是:有没有办法使用两列之一合并两个数据框?如果 df1$StudentID 匹配 df2$StudentID 或者 df1$SSN 匹配 df2$SSN,我想将 df1 与 df2 合并?提前致谢。
    猜你喜欢
    • 1970-01-01
    • 2020-12-21
    • 1970-01-01
    • 2022-01-18
    • 1970-01-01
    • 1970-01-01
    • 2020-05-17
    • 2020-10-13
    • 2021-09-18
    相关资源
    最近更新 更多