如何根据邮政编码确定地区答案

【问题标题】：How to determine the region based on postal code如何根据邮政编码确定地区
【发布时间】：2019-11-10 06:16:12
【问题描述】：

我有 2 个数据框，一个包含带有邮政编码的数据，另一个包含带有一组邮政编码的区域

我想根据邮政编码在 Dataframe 1 中添加一个“Regions”列，我该怎么做？（注意：数据框 2 中的区域可以包含多个邮政编码。

感谢您的帮助。

【问题讨论】：

将 df2$postcodes 拆分为多个列（data.table::tstrsplit() 将是我的选择）。然后融化df2。最后，左连接 df2 到 df1。
请使用代码而不是图纸。
@Humpelstielzchen 不过，通常不欢迎发布图纸图片。
在df2 上使用strsplit，然后在left_join df1 和df2 上使用
@Humpelstielzchen 在这种情况下，minimal reproducible example 应该由 OP 提供；)

标签： r dataframe data-manipulation

【解决方案1】：

这可以通过 dplyr 和 tidyr 解决。我敢肯定还有其他解决方案。

# create the data
df1 <- data.frame(pcodes = c(1001, 1002, 1003))
df2 <- data.frame(regions = c(1, 2), 
                  pcodes = c("1001, 1002, 1003", "1004, 1005"),
                  stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)

# separate postcodes column and reshape long
# (from https://stackoverflow.com/a/33288868/2633645)
df2 <- df2 %>% 
  mutate(to = strsplit(pcodes, split = ",")) %>% 
  unnest(to) %>% 
  mutate(to = as.numeric(to)) %>% 
  select(-pcodes) %>% 
  rename(pcodes = to) # rename `to` to `pcodes` for join purpose

# join the data sets by the common variable pcodes
df_both <- left_join(df1, df2)
df_both

  pcodes regions
1   1001       1
2   1002       1
3   1003       1

【讨论】：

Hi Mate，您的解决方案完美解决了我的问题，非常感谢！