【问题标题】:Weird behavior of sed's backreferencesed 反向引用的奇怪行为
【发布时间】:2021-12-15 22:54:51
【问题描述】:

我们有以下一行文字:

| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |

如您所见,文本行仅由三个相似的短语组成,可以使用以下 sed 表达式(单独)匹配和更改:

sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![\3](\1\2.\3.\4) |@p'

如果我们只有一个短语(而不是给定的三个),结果将如下:

$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![\3](\1\2.\3.\4) |@p'
| ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |

但是当我们有两个或你的短语时,结果总是指向最后一个匹配的短语:

这里有两个匹配的例子:

$ echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![\3](\1\2.\3.\4) |@p'
| ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) |

这是一个包含三个匹配项的示例:

$  echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![\3](\1\2.\3.\4) |@p'
| ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |

为什么会这样?

有没有办法强制 sed 只打印第一个匹配的结果?

预期的行为?我虽然下面的命令会打印出类似的东西(只是第一个匹配):

$  echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![\3](\1\2.\3.\4) |@p'
    | ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) |

或者这个(所有匹配):

$  echo '| ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |' | sed -n 's@| !\[.*\](\(\/img\/\)\([0-9]*\/[0-9]*\/[0-9]*\)\.\(.*\)\.\(.*\)) |@| ![\3](\1\2.\3.\4) |@p'
    | ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |

【问题讨论】:

    标签: sed backreference


    【解决方案1】:

    发生的情况是| !\[.*\] 匹配最长的匹配项。也就是说,第一个短语,直到最后一个短语的开头。如果您只想匹配第一个短语,则必须更具体。例如:

    sed 's@| !\[\]\(([^.]*\.\([^.]*\)\.[^)]*)\) |.*@| ![\2]\1 |@'
    

    【讨论】:

      【解决方案2】:

      我没有完全理解这个问题,但是,你可以试试这个sed

      $ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#' input_file
      

      这将打印所有 3 个匹配项,但只会替换到第一个匹配项

      $ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#' input_file
      | ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
      

      要针对所有 3 个,可以添加 g 标志

      sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#g' input_file
      | ![jakis-tam-text1](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![jakis-tam-text3](/img/2016/12/022.jakis-tam-text3.png#medium) |
      

      例如,您也可以仅定位 #2

      $ sed 's#\([^[]*.\)\([^\.]*.\([^\.]*\)[^)]*.\)#\1\3\2#2' input_file
      | ![](/img/2016/12/020.jakis-tam-text1.png#medium) | ![jakis-tam-text2](/img/2016/12/021.jakis-tam-text2.png#medium) | ![](/img/2016/12/022.jakis-tam-text3.png#medium) |
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2015-03-28
        • 1970-01-01
        • 2013-11-13
        • 1970-01-01
        • 2014-06-01
        • 2019-08-12
        • 2014-08-30
        相关资源
        最近更新 更多