【问题标题】:Trim Off Varying Last Special Characters in R修剪掉R中的最后一个特殊字符
【发布时间】:2017-02-09 18:38:06
【问题描述】:

下面是一种 gsub 方法,用于从数据帧中删除正斜杠。希望为具有不同列数的 data.frame 找到更通用的解决方案。

helloToday <- data.frame(a = c("hello", "hello", "hello"), 
                 b = c("world","","world"),
                 c = c("","","today"))

helloToday
#      a     b     c
# 1 hello world      
# 2 hello            
# 3 hello world today  


# Returns the vector 
helloToday <- apply(helloToday, 1, function(x){ paste0("/", paste(x, collapse = "/")) })
# [1] "/hello/world/"      "/hello//"           "/hello/world/today"

# But I would like the trailing symbols to be trimmed off
# [1] "/hello/world"      "/hello"           "/hello/world/today"


gsub("\\/$", "", gsub("\\/$", "", helloToday))
# "/hello/world/"      "/hello//"           "/hello/world/today"

helloToday <- gsub("\\//$", "", helloToday)
helloToday <- gsub("\\/$", "", helloToday)
# "/hello/world/"      "/hello//"           "/hello/world/today"

是否有允许不同列数的解决方案,其中“/”或“//”甚至“//////////”?

【问题讨论】:

  • + 是“一个或多个”的正则表达式符号,因此"\\/+$" 将匹配字符串末尾的任意数量的/
  • 其实转义/是没有意义的。使用sub("/+$", "", helloToday)

标签: r regex gsub stringr


【解决方案1】:

+ 是“一个或多个”的正则表达式修饰符,因此"/+$" 将匹配字符串末尾的任意数量的/

gsub("/+$", "", helloToday)

【讨论】:

    【解决方案2】:

    事后正则表达式的另一种方法是在开始时以不同的方式构建它:

    apply(helloToday, 1, function(x) do.call(file.path, as.list(x[!x %in% ''])))
    
    
    ## [1] "hello/world"       "hello"             "hello/world/today"
    

    如果需要前导斜杠:

    apply(helloToday, 1, function(x) do.call(file.path, as.list(c('', x[!x %in% '']))))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-07-31
      • 2011-04-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多