【问题标题】:Perl & Sed string substitute in several expressions几个表达式中的 Perl 和 Sed 字符串替换
【发布时间】:2023-03-07 23:34:01
【问题描述】:

我想以非贪婪匹配方式进行字符串替换

  • 删除所有前导和尾随破折号、撇号(当这些符号出现在单词中间时,必须保留)

  • 将多个空格转化为1个空格

例子:

--ONE   Tw'o--   -333-   -'FO-UR'

必须成为

ONE Tw'o 333 FO-UR

我无法得到确切的结果。你能帮我更正下面的 perl 和 sed 语法吗?

$ echo "--ONE   Tw'o--   -333-   -'FO-UR'" \
  | perl -pe "s/[-']+(.+?)/\1/g"           \
  | perl -pe "s/(.+?)[-']+/\1/g"           \
  | perl -pe "s/\s+/ /g"

Result (perl): "ONE Two 333 FOUR"

$ echo "--ONE   Tw'o--   -333-   -'FO-UR'" \
  | sed -r -e "s/[-']+(.+?)/\1/g"          \
    -e "s/(.+)[-']+/\1/g"                  \
    -e "s/\s+/ /g"

Result (sed): "ONE Tw'o-- -333- -'FO-UR"

【问题讨论】:

    标签: regex perl sed


    【解决方案1】:

    这里是 perl 版本:

    echo "--ONE   Tw'o--   -333-   -'FO-UR'" | perl -ne "s|-'||g; s|'-||g; s|^'||; s|'$||; s|^-+||; s|-+$||; s|-+\s+| |g; s|\s+-+| |g; s|\s+| |g; s|\s+$||; print;"
    
    ONE Tw'o 333 FO-UR
    

    sed版本基本一致:

    echo "--ONE   Tw'o--   -333-   -'FO-UR'" | sed -r -e "s|-'||g; s|'-||g; s|^'||; s|'$||; s|^-+||; s|-+$||; s|-+\s+| |g; s|\s+-+| |g; s|\s+| |g; s|\s+$||;"
    
    ONE Tw'o 333 FO-UR
    

    所用正则表达式的注释:

    s|-'||g;     # Remove dash followed by quote everywhere
    s|'-||g;     # Remove quote followed by dash everywhere
    s|^'||;      # Remove leading quote
    s|'$||;      # Remove trailing quote
    s|^-+||;     # Remove leading dash characters
    s|-+$||;     # Remove trailing dash characters
    s|-+\s+| |g; # Replace dash characters followed by whitespace with 1 space everywhere
    s|\s+-+| |g; # Replace whitespace followed by dash characters with 1 space everywhere
    s|\s+| |g;   # Replace multiple spaces with 1 space
    s|\s+$||;    # Remove trailing spaces
    

    【讨论】:

    • 非常感谢先生的帮助。非常感谢您不厌其烦地为 perl 和 sed 提供解决方案。这段代码是惊人的,正确的结果,没有使用任何反向引用替换。
    【解决方案2】:

    perl 中使用环视很容易:

    s='"asd,f",,,"as,df","asdf"asdf"'
    perl -pe 's/(?<!\w)-|-(?!\w)//g' <<< "$s"
    ONE Tw'o 333 'FO-UR'
    
    (?<!\w)- # Lookbehind meaning match - if not preceded by a word character
    |        # regex alternation
    (?!\w)-  # Lookahead meaning match - if not followed by a word character
    

    【讨论】:

      猜你喜欢
      • 2012-12-13
      • 1970-01-01
      • 1970-01-01
      • 2010-11-07
      • 1970-01-01
      • 1970-01-01
      • 2021-11-20
      • 1970-01-01
      • 2016-08-16
      相关资源
      最近更新 更多