根据条件在 R 中合并答案

【问题标题】：Merging in R based on conditions根据条件在 R 中合并
【发布时间】：2018-05-01 13:31:40
【问题描述】：

对于两个示例数据框：

df1 <- structure(list(name = c("Katie", "Eve", "James", "Alexander", 
"Mary", "Barrie", "Harry", "Sam"), postcode = c("CB12FR", "CB12FR", 
"NE34TR", "DH34RL", "PE46YH", "IL57DS", "IP43WR", "IL45TR")), .Names = c("name", 
"postcode"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("name", "postcode")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df2 <-structure(list(name = c("Katie", "James", "Alexander", "Lucie", 
"Mary", "Barrie", "Claire", "Harry", "Clare", "Hannah", "Rob", 
"Eve", "Sarah"), postcode = c("CB12FR", "NE34TR", "DH34RL", "DL56TH", 
"PE46YH", "IL57DS", "RE35TP", "IP43WQ", "BH35OP", "CB12FR", "DL56TH", 
"CB12FR", "IL45TR"), rating = c(1L, 1L, 1L, 2L, 3L, 1L, 4L, 2L, 
2L, 3L, 1L, 4L, 2L)), .Names = c("name", "postcode", "rating"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-13L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector")), rating = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("name", "postcode", "rating")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

我希望合并这两个数据框，因此将 df2 上的评级添加到 df1。我通常会使用：

ratings.df

但是.... 我希望仅在以下情况下合并： 1. df2 中的邮政编码是唯一的（即，如果每个名称（或不同的名称）有多个邮政编码，则不会合并）。 2. 并且名称的前三个字母在两个数据框中都相同。

（我很高兴为没有评级的邮政编码留出空白（然后我可以手动执行这些操作）。

这可能吗？

【问题讨论】：

标签： r

【解决方案1】：

为什么不使用sqldf 包？您可以使用此包在 R 中合并 data.frames。使用JOIN 语句来做到这一点。

就条件合并而言，这可以通过在SQL中使用CASE语句来完成。

因此，对于您的第一个条件，您可以使用CASE，其中COUNT(postcode) = ‘1’ 和您GROUP BY name，这样对于每个分配有1 个邮政编码的名称，您可以JOIN。

另一种选择是gather 使用tidyr。

【讨论】：

【解决方案2】：

使用dplyr 解决方案，我们可以首先消除df2$postcode 中的重复项，然后将数据框加入df1：

library(dplyr)
df3 <- df2 %>%
  distinct(postcode, .keep_all = TRUE)

df1 %>%
  left_join(df3, by = c("postcode")) %>%
  filter(substr(name.x, 1, 3) == substr(name.y, 1, 3)) %>%
  rename(name = name.x) %>%
  mutate(name.y = NULL)

这将产生

# A tibble: 5 x 3
  name      postcode rating
  <chr>     <chr>     <int>
1 Katie     CB12FR        1
2 James     NE34TR        1
3 Alexander DH34RL        1
4 Mary      PE46YH        3
5 Barrie    IL57DS        1

这是你想要达到的目标吗？

【讨论】：