【问题标题】:R regex to replace all punctuation except sentence markers, apostrophes and hyphensR 正则表达式替换除句子标记、撇号和连字符以外的所有标点符号
【发布时间】:2015-08-06 17:44:00
【问题描述】:

我正在寻找一种在 R 中标记句子开头和结尾的方法。为此,我想消除除句末标记(如句号、感叹号、问号和连字符)之外的所有标点符号。我想用标记 *** 代替。同时,我也想保留包含撇号的单词。举一个具体的例子,给定这个字符串:

txt <- "We have examined all the possibilities, however we have not reached a solid conclusion - however we keep and open mind! Have you considered any other approach? Haven't you?"

期望的结果是

txt <- "We have examined all the possibilities however he have not reached a solid conclusion *** however we keep and open mind*** Have you considered any other approach*** Haven't you***"

我还没能拿出一个正则表达式来做到这一点。非常感谢任何提示。

【问题讨论】:

    标签: regex r


    【解决方案1】:

    你可以使用 gsub。

    > txt <- "We have examined all the possibilities, however he have not reached a solid conclusion - however we keep and open mind! Have you considered any other approach? Haven't you?"
    > gsub("[-.?!]", "<S>", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
    [1] "We have examined all the possibilities however he have not reached a solid conclusion <S> however we keep and open mind<S> Have you considered any other approach<S> Haven't you<S>"
    > gsub("[-.?!]", "***", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
    [1] "We have examined all the possibilities however he have not reached a solid conclusion *** however we keep and open mind*** Have you considered any other approach*** Haven't you***"
    

    我想删除除句末标记之外的所有标点符号,例如句号、感叹号、问号和连字符。

    gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T)
    

    我想用标记 *** 替换它。同时,我也想保留包含撇号的单词。

    gsub("[-.?!]", "***", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
    

    【讨论】:

    • gsub("[-.?!]", "***", gsub("(?![-.?!]|\\b'\\b)[[:punct:]]", "", txt, perl=T))
    【解决方案2】:

    您可以通过使用两个正则表达式来做到这一点。首先,您可以使用字符类删除不需要的字符:

    [,.]
      ^--- Whatever you want to remove, put it here
    

    并使用一个空的替换字符串。

    然后,您可以像这样使用第二个正则表达式:

    [?!-]
      ^--- Add characters you want to replace here
    

    使用替换字符串:

    <S>
    

    Working demo

    【讨论】:

      猜你喜欢
      • 2013-02-21
      • 2017-11-11
      • 2012-01-31
      • 1970-01-01
      • 2014-02-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多