【问题标题】:Regex; eliminate all punctuation except正则表达式;消除所有标点符号,除了
【发布时间】:2012-11-14 03:10:33
【问题描述】:

我有以下正则表达式,可以拆分任何空格或标点符号。如何从:punct: 中排除 1 个或多个标点符号?假设我想排除撇号和逗号。我知道我可以明确地使用[all punctuation marks in here] 而不是[[:punct:]],但我希望有一种排除方法。

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)

 [1] "I"       "'"       "m"       "not"     "that"    "good"    "at"      "regex"   "yet"    
[10] ","       ""        "but"     "am"      "getting" "better"  "!"

【问题讨论】:

    标签: r regex strsplit


    【解决方案1】:

    我不清楚你想要的结果是什么,但你可以使用否定类like this answer

    R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
     [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
     [8] "but"     "am"      "getting" "better"  "!"    
    

    【讨论】:

      【解决方案2】:

      如果右边的下一个字符是',,您可以直接使用(?![',]) negative lookahead 对PCRE 子模式施加限制,但匹配失败:

      [[:space:]]|(?=(?![',])[[:punct:]])
                     ^^^^^^^^ 
      

      请参阅regex demo

      详情

      • [[:space:]] - 任何空格
      • | - 或
      • (?=(?![',])[[:punct:]]) - 一个正向前瞻,要求在当前位置的右侧没有',,并且有任何1 个不是', 的标点符号(实际上,需要除', 之外的任何标点符号)。

      R online demo

      X <- "I'm not that good at regex yet, but am getting better!"
      strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
      [[1]]
       [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
       [8] "but"     "am"      "getting" "better"  "!"
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-02-21
        • 1970-01-01
        • 2015-10-25
        • 1970-01-01
        相关资源
        最近更新 更多