【问题标题】:Return values not found for each ID - R未找到每个 ID 的返回值 - R
【发布时间】:2021-05-15 04:26:26
【问题描述】:

我想为每个供应商识别 Vendors 数据框中的不匹配值。换句话说,找到不在每个供应商的 Vendors 数据框中的国家/地区。

我有一个如下所示的数据框(供应商):

Vendor_ID Vendor Country_ID Country
1 Burger King 2 USA
1 Burger King 3 France
1 Burger King 5 Brazil
1 Burger King 7 Turkey
2 McDonald's 5 Brazil
2 McDonald's 3 France
Vendors <- data.frame (
Vendor_ID  = c("1", "1", "1", "1", "2", "2"),
      Vendor = c("Burger King", "Burger King", "Burger King", "Burger King", "McDonald's", "McDonald's"),
                  Country_ID = c("2", "3", "5", "7", "5", "3"),
                  Country = c("USA", "France", "Brazil", "Turkey", "Brazil", "France"))

我还有另一个数据框(国家/地区),如下所示:

Country_ID Country
2 USA
3 France
5 Brazil
7 Turkey
Countries <- data.frame (Country_ID = c("2", "3", "5", "7"),
                        Country = c("USA", "France", "Brazil", "Turkey"))

期望的输出:

Vendor_ID Vendor Country_ID Country
2 McDonald's 2 USA
2 McDonald's 7 Turkey

有人可以告诉我如何在 R 中实现这一点吗?我尝试了subset & ant-join,但结果不正确。

【问题讨论】:

  • 嗨。如果您添加minimal reproducible example,您可以让其他人更容易找到和测试您的问题的答案。这样你就可以帮助别人帮助你!
  • 刚刚编辑了问题。谢谢你告诉我。

标签: r dplyr subset


【解决方案1】:

使用expand.grid 创建所有可能的供应商 - 国家/地区组合的解决方案(假设“国家/地区”每个国家/地区只有一个条目),然后使用dplyr 加入“供应商”并查找“缺失的国家/地区”

编辑:最后两行 (left_joins) 只需要将 ID 列“翻译”成“文本”:

library(dplyr)

expand.grid(Vendor_ID=unique(Vendors$Vendor_ID), Country_ID=Countries$Country_ID) %>% 
  left_join(Vendors) %>% 
  filter(is.na(Vendor)) %>%
  select(Vendor_ID, Country_ID) %>% 
  left_join(Countries) %>% 
  left_join(unique(Vendors[, c("Vendor_ID", "Vendor")]))

返回

  Vendor_ID Country_ID Country     Vendor
1         2          2     USA McDonald's
2         2          7  Turkey McDonald's

【讨论】:

    【解决方案2】:

    Base R 中,我们可以先按供应商拆分数据

    VenList <- split(df, df$Vendor)
    

    然后我们可以检查缺少哪个国家/地区并将其退回。

    res <- lapply(VenList, function(x){
      
      # Identify missing country of vendors
      tmp1 <- df2[!(df2[, "Country"] %in% x[, "Country"]), ]
      
      # get vendor and vendor ID
      tmp2 <- x[1:nrow(tmp1), 1:2]
      
      # cbind
      if(nrow(tmp2) == nrow(tmp1)){
        cbind(tmp2, tmp1)
      }
    })
    
    # Which yields
    
    res
    
    # $BurgerKing
    # NULL
    # 
    # $`McDonald's`
    #   Vendor_ID     Vendor Country_ID Country
    # 5         2 McDonald's          2     USA
    # 6         2 McDonald's          7  Turkey
    
    # If you want it as one df you could then flatten to 
    
    do.call(rbind, res)
    
    #              Vendor_ID     Vendor Country_ID Country
    # McDonald's.5         2 McDonald's          2     USA
    # McDonald's.6         2 McDonald's          7  Turkey
    

    数据

    df <- read.table(text = "1  BurgerKing  2   USA
    1   BurgerKing  3   France
    1   BurgerKing  5   Brazil
    1   BurgerKing  7   Turkey
    2   McDonald's 5    Brazil
    2   McDonald's 3    France", col.names = c("Vendor_ID", "Vendor",   "Country_ID",   "Country"))
    
    df2 <- read.table(text = "2 USA
    3   France
    5   Brazil
    7   Turkey", col.names = c("Country_ID",    "Country")) `
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-06-30
      • 1970-01-01
      • 2016-01-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多