【问题标题】:Best cas to merge three data frames合并三个数据框的最佳案例
【发布时间】:2019-07-17 12:15:39
【问题描述】:

连接三个数据框时出现问题。我的第一个数据框如下所示:

 id <- c('123','456','789','433','234')
 article1 <- c('111', '222', '333','345','443')
 article2 <- c('111', '333', '223','987','230')
 article3 <- c('234', '552', '897','543','098')
 article4 <- c('231', '322', '341','313','099')
 article5 <- c('242', '222', '222','987','443')

df1 <- data.frame(id, article1,article2,article3,article4,article5)

df1

   id article1 article2 article3 article4 article5
1 123      111      111      234      231      242
2 456      222      333      552      322      222
3 789      333      223      897      341      222
4 433      345      987      543      313      987
5 234      443      230      098      099      443

现在我有了第二个 df,其中包含有关列 ID 的更多信息。此 df 有几行用于 ID。例如:

id <- c('123','123','789','433','789')
firstname <-c('Paul','Peter', 'Andi', 'Tim', 'Claire')
lastname <-c('P','D', 'A', 'T', 'C')
features <-c('AAB', 'AAC','BBD', 'CCD', 'CDC')

df2 <- data.frame(id, firstname, lastname, features)

df2

   id firstname lastname features
1 123      Paul        P      AAB
2 123     Peter        D      AAC
3 789      Andi        A      BBD
4 433       Tim        T      CCD
5 789    Claire        C      CDC

第三个数据框如下所示,提供有关文章的信息:

articlenumber <- c('111', '222', '333','443','345','223','234','552')
info <- c('ABC', 'CEF', 'DEF', 'FFF', 'FFD','CCF','LLK','LKO')

df3 <- data.frame(articlenumber, info)

df3

  articlenumber info
1           111  ABC
2           222  CEF
3           333  DEF
4           443  FFF
5           345  FFD
6           223  CCF
7           234  LLK
8           552  LKO

最终的结果应该是这样的:

   id article1 info article2 info article3 info article4 info article5 info firstname lastname features
1 123 111      ABC  111      ABC  234      LLK  333      DEF  222      CEF Paul P AAB
2 123 111      ABC  111      ABC  234      LLK  333      DEF  222      CEF Peter D AAC    
3 456 222      CEF  333      DEF  552      LKO  111      ABC  222      CEF Andi A BBD
4 789 333      DEF  223      CCF  552      LKO  333      DEF  222      CEF Claire C CDK

抱歉我的表格格式不正确。我希望你明白我想要什么?如果有不止一个人,则该行也应该出现不止一次。我已经尝试过合并和加入,但没有得到结果。

编辑:

使用 reduce 我可以合并 df1 和 df2:

Reduce(function(x,y) merge(x,y,by="id",all=TRUE) ,list(df1,df2))
   id article1 article2 article3 article4 article5 firstname lastname features
1 123      111      111      234      231      242      Paul        P      AAB
2 123      111      111      234      231      242     Peter        D      AAC
3 234      443      230      098      099      443      <NA>     <NA>     <NA>
4 433      345      987      543      313      987       Tim        T      CCD
5 456      222      333      552      322      222      <NA>     <NA>     <NA>
6 789      333      223      897      341      222      Andi        A      BBD
7 789      333      223      897      341      222    Claire        C      CDC

那么我怎样才能将文章信息从 df3 获取到这个 df 中呢?

【问题讨论】:

    标签: r dataframe join


    【解决方案1】:

    您可以像这样使用dplyr 包中的left_join:请注意,首先我用stringsAsFactors = F 定义data.frames。否则像这样加入他们是行不通的。

    df1 <- data.frame(id = c('123','456','789','433','234'), article1,article2,article3,article4,article5, stringsAsFactors = F)
    df2 <- data.frame(id = c('123','123','789','433','789'), firstname, lastname, features, stringsAsFactors = F)
    df3 <- data.frame(articlenumber, info, stringsAsFactors = F)
    
    df1 %>% left_join(df2, by = "id") %>%
      left_join(df3 %>% rename(info1 = info), by = c("article1" = "articlenumber")) %>% 
      left_join(df3 %>% rename(info2 = info), by = c("article2" = "articlenumber")) %>% 
      left_join(df3 %>% rename(info3 = info), by = c("article3" = "articlenumber")) %>% 
      left_join(df3 %>% rename(info4 = info), by = c("article4" = "articlenumber")) %>% 
      left_join(df3 %>% rename(info5 = info), by = c("article5" = "articlenumber")) %>%
      select(id, article1, info1, article2, info2, article3, info3, article4, info4, 
             article5, info5, everything())
    
       id article1 info1 article2 info2 article3 info3 article4 info4 article5 info5 firstname lastname features
    1 123      111   ABC      111   ABC      234   LLK      231  <NA>      242  <NA>      Paul        P      AAB
    2 123      111   ABC      111   ABC      234   LLK      231  <NA>      242  <NA>     Peter        D      AAC
    3 456      222   CEF      333   DEF      552   LKO      322  <NA>      222   CEF      <NA>     <NA>     <NA>
    4 789      333   DEF      223   CCF      897  <NA>      341  <NA>      222   CEF      Andi        A      BBD
    5 789      333   DEF      223   CCF      897  <NA>      341  <NA>      222   CEF    Claire        C      CDC
    6 433      345   FFD      987  <NA>      543  <NA>      313  <NA>      987  <NA>       Tim        T      CCD
    7 234      443   FFF      230  <NA>      098  <NA>      099  <NA>      443   FFF      <NA>     <NA>     <NA>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-09-21
      • 2018-02-10
      • 2014-04-09
      • 2011-04-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多