【问题标题】:bash regex with quotes?带引号的 bash 正则表达式?
【发布时间】:2022-08-12 01:01:50
【问题描述】:

以下代码

number=1
if [[ $number =~ [0-9] ]]
then
  echo matched
fi

作品。但是,如果我尝试在正则表达式中使用引号,它会停止:

number=1
if [[ $number =~ "[0-9]" ]]
then
  echo matched
fi

我也试过"\[0-9\]"。我错过了什么?

有趣的是,bash advanced scripting guide 建议这应该有效。

重击版本 3.2.39。

【问题讨论】:

  • ABS 作为不准确(或者,在更好的日子里,仅仅是误导性的)指导的来源而臭名昭著;将其视为 shell 脚本的 W3School。考虑将 bash-hackers.org 或 wooledge wiki 作为替代品,以确保准确性。

标签: regex bash quotes


【解决方案1】:

它被更改为between 3.1 and 3.2。猜想高级指南需要更新。

这是对新功能的简要描述 添加到 bash-3.2 的功能 发布 bash-3.1。一如既往, 手册页(doc/bash.1)是地方 寻找完整的描述。

  1. Bash 中的新功能

    剪辑

    F。引用字符串参数 [[ 命令的 =~ 运算符现在强制 字符串匹配,与其他模式匹配运算符一样。

    遗憾的是,除非您有洞察力将模式存储在变量中并使用它们而不是直接使用正则表达式,否则这将破坏使用脚本的现有报价。下面的例子。

    $ bash --version
    GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
    Copyright (C) 2007 Free Software Foundation, Inc.
    $ number=2
    $ if [[ $number =~ "[0-9]" ]]; then echo match; fi
    $ if [[ $number =~ [0-9] ]]; then echo match; fi
    match
    $ re="[0-9]"
    $ if [[ $number =~ $re ]]; then echo MATCH; fi
    MATCH
    
    $ bash --version
    GNU bash, version 3.00.0(1)-release (i586-suse-linux)
    Copyright (C) 2004 Free Software Foundation, Inc.
    $ number=2
    $ if [[ $number =~ "[0-9]" ]]; then echo match; fi
    match
    $ if [[ "$number" =~ [0-9] ]]; then echo match; fi
    match
    

【讨论】:

  • 这真的很有趣。引用的正则表达式不再有效。带空格的不带引号的正则表达式不起作用。基于变量的正则表达式即使包含空格也能正常工作。真是一团糟。
  • 有趣的是,这个有效:if [[ $number =~ ["0-9"] ]]; then echo match; fi
  • 这太令人失望了,我们需要依靠echocompat31解决方法......
【解决方案2】:

Bash 3.2 引入了一个兼容性选项 compat31,它将 bash 正则表达式引用行为恢复到 3.1

没有 compat31:

$ shopt -u compat31
$ shopt compat31
compat31        off
$ set -x
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ [0-9] ]]
+ echo no match
no match

使用 compat31:

$ shopt -s compat31
+ shopt -s compat31
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ [0-9] ]]
+ echo match
match

补丁链接: http://ftp.gnu.org/gnu/bash/bash-3.2-patches/bash32-039

【讨论】:

    【解决方案3】:

    GNU bash,版本 4.2.25(1)-release (x86_64-pc-linux-gnu)

    字符串匹配和正则表达式匹配的一些例子

        $ if [[ 234 =~ "[0-9]" ]]; then echo matches;  fi # string match
        $ 
    
        $ if [[ 234 =~ [0-9] ]]; then echo matches;  fi # regex natch 
        matches
    
    
        $ var="[0-9]"
    
        $ if [[ 234 =~ $var ]]; then echo matches;  fi # regex match
        matches
    
    
        $ if [[ 234 =~ "$var" ]]; then echo matches;  fi # string match after substituting $var as [0-9]
    
        $ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi   # string match after substituting $var as [0-9]
    
        $ if [[ 'rss$var919' =~ $var ]]; then echo matches;  fi # regex match after substituting $var as [0-9]
        matches
    
    
        $ if [[ "rss$var919" =~ "$var" ]]; then echo matches;  fi # string match won't work
    
        $ if [[ "rss\$var919" =~ "$var" ]]; then echo matches;  fi # string match won't work
    
    
        $ if [[ "rss'$var'""919" =~ "$var" ]]; then echo matches;  fi # $var is substituted on LHS & RHS and then string match happens 
        matches
    
        $ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi # string match !
        matches
    
    
    
        $ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi # string match failed
        $ 
    
        $ if [[ 'rss$var919' =~ '$var' ]]; then echo matches;  fi # string match
        matches
    
    
    
        $ echo $var
        [0-9]
    
        $ 
    
        $ if [[ abc123def =~ "[0-9]" ]]; then echo matches;  fi
    
        $ if [[ abc123def =~ [0-9] ]]; then echo matches;  fi
        matches
    
        $ if [[ 'rss$var919' =~ '$var' ]]; then echo matches;  fi # string match due to single quotes on RHS $var matches $var
        matches
    
    
        $ if [[ 'rss$var919' =~ $var ]]; then echo matches;  fi # Regex match 
        matches
        $ if [[ 'rss$var' =~ $var ]]; then echo matches;  fi # Above e.g. really is regex match and not string match
        $
    
    
        $ if [[ 'rss$var919[0-9]' =~ "$var" ]]; then echo matches;  fi # string match RHS substituted and then matched
        matches
    
        $ if [[ 'rss$var919' =~ "'$var'" ]]; then echo matches;  fi # trying to string match '$var' fails
    
    
        $ if [[ '$var' =~ "'$var'" ]]; then echo matches;  fi # string match still fails as single quotes are omitted on RHS 
    
        $ if [[ '$var' =~ "'$var'" ]]; then echo matches;  fi # this string match works as single quotes are included now on RHS
        matches
    

    【讨论】:

      【解决方案4】:

      正如其他答案中提到的,将正则表达式放在变量中是实现不同 版本兼容性的通用方法。您也可以使用此解决方法来实现相同的目的,同时将正则表达式保留在条件表达式中:

      $ number=1
      $ if [[ $number =~ $(echo "[0-9]") ]]; then echo matched; fi
      matched
      $ 
      

      【讨论】:

      • 使用命令替换会导致小的性能损失,这在某些情况下可能会很严重(例如,在循环中进行大量检查)。
      【解决方案5】:

      使用局部变量比使用命令替换的性能稍好。

      对于较大的脚本或脚本集合,使用实用程序来防止不需要的局部变量污染代码并减少冗长可能是有意义的。这似乎运作良好:

      # Bash's built-in regular expression matching requires the regular expression
      # to be unqouted (see https://stackoverflow.com/q/218156), which makes it harder
      # to use some special characters, e.g., the dollar sign.
      # This wrapper works around the issue by using a local variable, which means the
      # quotes are not passed on to the regex engine.
      regex_match() {
        local string regex
        string="${1?}"
        regex="${2?}"
        # shellcheck disable=SC2046 `regex` is deliberately unquoted, see above.
        [[ "${string}" =~ ${regex} ]]
      }
      

      用法示例:

      if regex_match "${number}" '[0-9]'; then
        echo matched
      fi
      

      【讨论】:

        猜你喜欢
        • 2010-09-18
        • 2015-03-24
        • 2021-10-29
        • 2016-12-04
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多