【问题标题】:R bank statement groupingR银行对账单分组
【发布时间】:2017-09-30 15:45:48
【问题描述】:

我正在通过按零售商名称对购买进行分组来分析我的银行对账单,然后可以使用 dplyr 函数分析生成的数据框。我下面的方法使用自定义函数并且有效,但我很想知道是否有更有效的方法。例如,是否有任何包可以使用数据框列之间的复杂匹配逻辑来连接数据框?

debug(FindRetailer)

FindRetailer<-function(Purchase){
    P <- toupper(Purchase)
  for(z in 1:length(RetailerNames)){
    Retailer<-toupper(RetailerNames[z])
    HasFound=grepl(Retailer,P)
    if(HasFound==TRUE){
      return(str_to_title(Retailer))
    }
  }
    return("Donno")
}

Statement <- data.frame(
  Purchase = c("abc Aldi xyz","a Kmart bcd","a STARBUCKS ghju","abcd MacD efg"),
  Amount = c(235,23,789,45))

RetailerNames<- c("Aldi","Kmart","Starbucks","MacD")

# what I need
Result <- data.frame(
  Purchase = c("abc Aldi xyz","a KMART bcd","a STARBUCKS mmm","abcd MACD efg"),
  Amount = c(235,23,789,45),
  Retailer = c("Aldi","Kmart","Starbucks","Macd"))

# this works using custom function
NewStatment<-Statement %>% 
  rowwise() %>% 
  mutate(Retailer=FindRetailer(Purchase))

# is this possible: join dataframes using complex string matching?
# this doesn't work yet
TestMethod<-Statement %>% 
  left_join(RetailerNames,by="Statement.Purchase %in% RetailerNames")

【问题讨论】:

    标签: r inner-join banking


    【解决方案1】:


    library(tidyverse)
    library(glue) 
    Statement <- data.frame(
      Purchase = c("abc Aldi xyz","a Kmart bcd","a STARBUCKS ghju","abcd MacD efg"),
      Amount = c(235,23,789,45))
    
    RetailerNames<- c("Aldi","Kmart","Starbucks","MacD")
    
    
    Statement %>% 
      mutate(
        Retailer = Purchase %>% 
          str_extract(RetailerNames %>% collapse(sep ="|") %>% regex(ignore_case = T))
        )
    #>           Purchase Amount  Retailer
    #> 1     abc Aldi xyz    235      Aldi
    #> 2      a Kmart bcd     23     Kmart
    #> 3 a STARBUCKS ghju    789 STARBUCKS
    #> 4    abcd MacD efg     45      MacD
    

    如果你想走left_join路线,试试

    library(fuzzyjoin)
    
    RetailerNames<- data_frame(Retailer = c("Aldi","Kmart","Starbucks","MacD"))
    
    Statement %>%
      regex_left_join(RetailerNames, by = c(Purchase="Retailer"))
    

    【讨论】:

    • 谢谢 我认为会有一个简单的解决方案。我也去看看fuzzyjoin
    • 我编辑了解决方案,因为我的原件只是因为幸运的巧合才起作用。我目前的解决方案涉及将零售商名称向量折叠成正则表达式字符串
    • 感谢指正和模糊逻辑方法
    • 很高兴。这只是表明有时创建一个好的可重现示例是多么棘手。你的例子很棒,但这是我最初的解决方案唯一可以解决的情况......
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-02
    • 1970-01-01
    • 2021-10-18
    • 2011-07-31
    • 1970-01-01
    相关资源
    最近更新 更多