【问题标题】:R: Creating a new column using another dataframeR:使用另一个数据框创建一个新列
【发布时间】:2019-08-12 16:31:10
【问题描述】:

我有两个数据框:

1) 数据1:data1 <- data.frame(Group = c(1, 2, 3), Region = c("Southeast Med, Southeast Low, Southwest Low, Northeast Med", "Northeast High, East Med, Midwest Med High", "Midwest Low, California and HI, West High"),stringsAsFactors=F)

2) 数据2:data2 <- data.frame(Region = c('California and HI', 'California and HI', 'Northeast High', 'California and HI', 'West High', 'Midwest Med High', 'California and HI', 'California and HI', 'California and HI', 'Southwest Low', 'Midwest Med High', 'California and HI', 'East Med', 'Southeast Low', 'Southeast Med', 'Midwest Med High', 'Southeast Med', 'West High', 'Northeast High', 'California and HI', 'West High', 'California and HI', 'California and HI', 'West High', 'California and HI', 'West High', 'California and HI', 'California and HI'))

我想在 data2 中创建一个新列,例如 data2$Group 使用 data1,其中 group 列使用 data1 检查哪个区域属于哪个组并填充它。我该怎么做?另外,比如说,data1 是一个列表而不是一个数据框,可能的方法是什么?

【问题讨论】:

    标签: r dataframe mapping match


    【解决方案1】:

    使用您发布的数据集,您可以做到这一点

    library(tidyverse)
    
    # update data1
    data1_upd = data1 %>% separate_rows(Region, sep = ", ")
    
    # join datasets
    data2_upd = data2 %>% left_join(data1_upd, by="Region")
    

    新数据集data2_upd 将如下所示:

    #               Region Group
    # 1  California and HI     3
    # 2  California and HI     3
    # 3     Northeast High     2
    # 4  California and HI     3
    # 5          West High     3
    # 6   Midwest Med High     2
    # 7  California and HI     3
    # 8  California and HI     3
    # 9  California and HI     3
    # 10     Southwest Low     1
    # 11  Midwest Med High     2
    # 12 California and HI     3
    # 13          East Med     2
    # 14                      NA
    # 15                      NA
    # 16                      NA
    # 17     Southeast Med     1
    # 18         West High     3
    # 19    Northeast High     2
    # 20 California and HI     3
    # 21         West High     3
    # 22 California and HI     3
    # 23 California and HI     3
    # 24         West High     3
    # 25 California and HI     3
    # 26         West High     3
    # 27 California and HI     3
    # 28 California and HI     3
    

    请注意,此方法使用精确的字符串匹配来连接 2 个数据集。因此,它区分大小写,并且您所在区域之前或之后的任何空格都会“破坏”连接。这意味着如果您的数据不像示例中那样“干净”,您可能需要进行一些预处理(例如,将区域更新为小写,删除任何初始/尾随空格)。

    【讨论】:

      猜你喜欢
      • 2019-08-18
      • 2021-08-06
      • 1970-01-01
      • 2017-07-05
      • 1970-01-01
      • 1970-01-01
      • 2016-05-07
      • 1970-01-01
      • 2021-02-03
      相关资源
      最近更新 更多