【问题标题】:Compare two data frames to populate the Days range in r比较两个数据框以填充 r 中的天数范围
【发布时间】:2021-11-11 02:40:18
【问题描述】:

我有两个数据框 DF1 和 DF2,我需要比较 DF1 中的天数与 DF2 中的 LOW Range 和 HI Range 列,并在结果数据框中获取 Days Range 列。

Items=c("Vegetables","Fruits","Grocery","Dairy Product")
Days=c(16,5,41,25)
DF1=data.frame(Items,Days)

Low_Range=c(0,8,15,22,31,61)
Hi_Range=c(7,14,21,30,60,90)
Days_Range=c("within 7 days","8 to 14 days","15 to 21 days","22 to 30 days","31 to 60 days","61 to 90 days")
DF2=data.frame(Low_Range,Hi_Range,Days_Range)

Days_Slot=c("15 to 21 days","within 7 days","31 to 60 days","22 to 30 days")
DF_Result=data.frame(Items,Days,Days_Slot)

DF_Result 将是我的结果数据框,其中 Days_Slot 作为添加到 DF1 的新列。 谁能帮忙解决这个问题

【问题讨论】:

    标签: r dataframe data.table match


    【解决方案1】:

    这可以通过在非等值连接中更新来解决

    library(data.table)
    setDT(DF1)[setDT(DF2), on = .(Days >= Low_Range, Days <= Hi_Range), 
               Days_Slot := Days_Range][]
    
               Items Days     Days_Slot
    1:    Vegetables   16 15 to 21 days
    2:        Fruits    5 within 7 days
    3:       Grocery   41 31 to 60 days
    4: Dairy Product   25 22 to 30 days
    

    注意DF1 是通过引用更新的,即新列Days_Slot 被附加到DF1 而不复制对象。


    由于间隔是连续的,匹配的Days_Range 也可以通过滚动连接来确定:

    library(data.table)
    setDT(DF1)
    setDT(DF2)
    DF1[, Days_Slot := DF2[DF1, on = .(Low_Range = Days), roll = TRUE]$Days_Range][]
    
               Items Days     Days_Slot
    1:    Vegetables   16 15 to 21 days
    2:        Fruits    5 within 7 days
    3:       Grocery   41 31 to 60 days
    4: Dairy Product   25 22 to 30 days
    

    同样,一个新列 Days_Slot 被附加到 DF1通过引用

    顺便说一句,向后滚动连接会给出相同的结果:

    DF1[, Days_Slot := DF2[DF1, on = .(Hi_Range = Days), roll = -Inf]$Days_Range][]
    

    【讨论】:

      【解决方案2】:

      您可以使用fuzzyjoin

      fuzzyjoin::fuzzy_left_join(DF1, DF2,
                                 by = c('Days' = 'Low_Range', 'Days' = 'Hi_Range'), 
                                 match_fun = c(`>=`, `<=`))
      
      #          Items Days Low_Range Hi_Range    Days_Range
      #1    Vegetables   16        15       21 15 to 21 days
      #2        Fruits    5         0        7 within 7 days
      #3       Grocery   41        31       60 31 to 60 days
      #4 Dairy Product   25        22       30 22 to 30 days
      

      如果你的数据集很大,你也可以试试data.table

      library(data.table)
      setDT(DF1)
      setDT(DF2)
      
      DF2[DF1, on = .(Low_Range <= Days, Hi_Range >= Days)]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-07-02
        • 1970-01-01
        • 1970-01-01
        • 2021-12-17
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多