【问题标题】:gsubfn function not giving desired output when ignore.case = TRUE当 ignore.case = TRUE 时,gsubfn 函数没有给出所需的输出
【发布时间】:2018-03-15 00:28:22
【问题描述】:

我正在尝试用相应的替换字符串替换字符向量中的多个模式。在做了一些研究之后,我发现了我认为能够做我想做的事情的包 gsubfn,但是当我运行下面的代码时,我没有得到我的预期输出(结果与我预期的结果相比,请参见问题的结尾)。

library(gsubfn)

# Our test data that we want to search through (while ignoring case)

test.data<- c("1700 Happy Pl","155 Sad BLVD","82 Lolly ln", "4132 Avent aVe")

#     A list data frame which contains the patterns we want to search for 
#     (again ignoring case) and the associated replacement strings we want to 
#     exchange any matches we come across with.


frame<- data.frame(pattern= c(" Pl"," blvd"," LN"," ave"), replace= c(" Place", " Boulevard", " Lane", " Avenue"),stringsAsFactors = F)

# NOTE: I added spaces in front of each of our replacement terms to make
#       sure we only grab matches that are their own word (for instance if an
#       address was 45 Splash Way we would not want to replace "pl" inside of 
#       "Splash" with "Place

#     The following set of paste lines are supposed to eliminate the substitute function from
#     grabbing instances like first instance of " Ave" found directly after "4132" 
#     inside "4132 Avent Ave" which we don't want converted to " Avenue".

pat <- paste(paste(frame$pattern,collapse = "($|[^a-zA-Z])|"),"($|[^a-zA-Z])", sep = "")

#     Here is the gsubfn function I am calling
gsubfn(x = test.data, pattern = pat, replacement = setNames(as.list(frame$replace),frame$pattern), ignore.case = T)

正在接收输出:

[1] "1700 Happy" "155 Sad"    "82 Lolly"   "4132 Avent"

预期输出:

[1] "1700 Happy Place" "155 Sad Boulevard" "82 Lolly Lane" "4132 Avent Avenue"

我关于为什么这不起作用的工作理论是,由于某些大小写差异(例如:在“155 Sad BLVD" 不 == " blvd" 即使由于 ignore.case 参数而可以被视为匹配项)。有人可以确认这是问题/指出我还有什么可能出错的地方,也许是一种解决这个问题的方法,它不需要我扩展我的模式向量以包括所有大小写排列(如果可能)?

【问题讨论】:

    标签: r replace gsub case-sensitive


    【解决方案1】:

    似乎stringr 为您提供了一个简单的解决方案:

    library(stringr)
    
    str_replace_all(test.data, 
                    regex(paste0('\\b',frame$pattern,'$'),ignore_case = T),
                    frame$replace)
    #[1] "1700 Happy Place"  "155 Sad Boulevard" "82 Lolly Lane"     "4132 Avent Avenue"
    

    请注意,由于棘手的“Avent aVe”,我必须更改正则表达式以仅查找字符串末尾的单词。但当然还有其他方法可以处理。

    【讨论】:

      猜你喜欢
      • 2016-09-07
      • 2022-11-27
      • 2013-01-28
      • 2018-10-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多