如何在多个条件下使用 dplyr left_join？答案

【问题标题】：How to use dplyr left_join with multiple conditions?如何在多个条件下使用 dplyr left_join？
【发布时间】：2022-09-23 16:59:05
【问题描述】：

我正在尝试加入两个数据框，nCode 和 index，如下图所示。底部显示的代码通过匹配eleCnt 列将concat 列从index 添加到nCode，但我试图添加concat 仅在条件为时添加（加入）的条件遇到Group <> 0 或grpID 两个数据帧之间的匹配。在 dplyr 或 base R 中是否有一种干净、简单的方法来做到这一点？我暂时避免使用 data.table，因为我对 R 很陌生，并且现在更喜欢让它更简单。我一直在玩弄 dplyr 的 filter() 函数来添加这个条件，但还没有运气。

此类问题在 dplyr left_join by less than, greater than condition 等其他帖子中得到解决，我喜欢 Jon Spring 使用 left_join() 的开发版本的解决方案，例如您可以在其中使用 left_join(x, y, join_by(a >= b, c < d))，但我对使用开发人员持谨慎态度版本担心错误等。

代码：

library(dplyr)

myDF5 <- 
  data.frame(
    Name = c(\"B\",\"R\",\"R\",\"R\",\"B\",\"X\",\"X\"),
    Group = c(0,0,1,1,0,2,2)
    ) 

nCode <- myDF5 %>%
  mutate(origOrder = row_number()) %>%
  group_by(Name) %>%
  mutate(eleCnt = row_number()) %>%
  ungroup() %>%
  mutate(seqBase = ifelse(Group == 0 | Group != lag(Group), eleCnt,0)) %>%
  mutate(seqBase = na_if(seqBase, 0)) %>%
  group_by(Name) %>%
  fill(seqBase) %>%
  mutate(seqBase = match(seqBase, unique(seqBase))) %>%
  ungroup()

grpRnk <- nCode %>% select(Name,Group,eleCnt) %>% 
  filter(Group > 0) %>% 
  group_by(Name,Group) %>% 
  slice(which.min(Group)) %>% 
  ungroup() %>%
  arrange(eleCnt) %>%
  mutate(grpRnk = dense_rank(eleCnt)) %>% 
  select(-eleCnt) 

nCode <- left_join(nCode,grpRnk, by = c(\"Name\", \"Group\")) %>%
  mutate(subGrp = ifelse(Group > 0, 
            sapply(1:n(), function(x) sum(Name[1:x]==Name[x]& 
            Group[1:x] == Group[x])), 0)) %>%
  mutate(grpID = sapply(1:n(), function(x) sum(eleCnt[(Group[1:n()] == Group[x]) & 
            (Name[1:n()] == Name[x]) & 
            (Group[1:n()]!= 0)])))

i = 1

index <- 
  filter(nCode, grpRnk == i) %>%
  distinct(eleCnt, .keep_all = TRUE) %>%
  mutate(grpID = sapply(1:n(), function(x) sum(eleCnt))) %>%
  mutate(concat = seqBase + subGrp/10) %>%
  select(eleCnt,grpID,concat)

index %>%
  select(eleCnt,concat) %>%
  left_join(nCode, ., by = \"eleCnt\")

请不要将代码、错误消息、结果或数据作为图像上传给these reasons - 和these。
第 6 行和第 7 行呢，它们也不应该匹配/是NA 吗？因为Group != 0 和Group != grpID。
lks_swrx，第 6 行和第 7 行应该匹配，因为即使它们的 Group <> 0，它们的 grpID 3 匹配索引 grpID 3

标签： r dplyr match left-join

【解决方案1】：

执行此操作的一种方法是我通过覆盖的典型回退，将 OP 底部的 index %>% select(...) 替换为以下内容：

index %>%
   select(eleCnt,concat) %>%
   left_join(nCode, ., by = "eleCnt") %>% 
   mutate(concat = ifelse(Group != 0 & grpID != max(index$grpID),NA,concat))

这给出了正确答案：

  Name  Group origOrder eleCnt seqBase grpRnk subGrp grpID concat
  <chr> <dbl>     <int>  <int>   <int>  <int>  <dbl> <int>  <dbl>
1 B         0         1      1       1     NA      0     0    1.1
2 R         0         2      1       1     NA      0     0    1.1
3 R         1         3      2       2      2      1     5   NA  
4 R         1         4      3       2      2      2     5   NA  
5 B         0         5      2       2     NA      0     0    1.2
6 X         2         6      1       1      1      1     3    1.1
7 X         2         7      2       1      1      2     3    1.2

但是除了覆盖concat 列之外，必须有一种更简洁的方法来做到这一点。

【讨论】：