【问题标题】:PHP - preg_match/preg_replace problemsPHP - preg_match/preg_replace 问题
【发布时间】:2013-06-09 01:34:32
【问题描述】:

我对 preg_match 和 preg_replace 有点困惑。我有一个很长的内容字符串(来自博客),我想查找、分隔和替换所有 [caption] 标签。可能的标签可以是:

[caption]test[/caption]
[caption align="center" caption="test" width="123"]<img src="...">[/caption]
[caption caption="test" align="center" width="123"]<img src="...">[/caption]

等等

这是我拥有的代码(但我发现它没有按照我想要的方式工作......):

public function parse_captions($content) {
    if(preg_match("/\[caption(.*) align=\"(.*)\" width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $content, $c)) {
        $caption = $c[4];         
        $code = "<div>Test<p class='caption-text'>" . $caption . "</p></div>";
        // Here, I'd like to ONLY replace what was found above (since there can be
        // multiple instances
        $content = preg_replace("/\[caption(.*) width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $code, $content);
    }
    return $content;
}

【问题讨论】:

    标签: php regex preg-replace preg-match


    【解决方案1】:

    目标是忽略内容位置。你可以试试这个:

    $subject = <<<'LOD'
    [caption]test1[/caption]
    [caption align="center" caption="test2" width="123"][/caption]
    [caption caption="test3" align="center" width="123"][/caption]
    LOD;
    
    $pattern = <<<'LOD'
    ~
    \[caption                          # begining of the tag 
    (?>[^]c]++|c(?!aption\b))*         # followed by anything but c and ]
                                       # or c not followed by "aption"
    
    (?|                                # alternation group
        caption="([^"]++)"[^]]*+]      # the content is inside the begining tag  
      |                                # OR
        ]([^[]+)                       # outside 
    )                                  # end of alternation group
    
    \[/caption]                        # closing tag
    ~x
    LOD;
    
    $replacement = "<div>Test<p class='caption-text'>$1</p></div>";
    
    echo htmlspecialchars(preg_replace($pattern, $replacement, $subject));
    

    模式(精简版):

    $pattern = '~\[caption(?>[^]c]++|c(?!aption\b))*(?|caption="([^"]++)"[^]]*+]|]([^[]++))\[/caption]~';
    

    模式解释:

    标签开始之后,您可以在] 或标题属性之前拥有内容。此内容描述为:

    (?>                # atomic group
        [^]c]++        # all characters that are not ] or c, 1 or more times
      |                # OR
        c(?!aption\b)  # c not followed by aption (to avoid the caption attribute)
    )*                 # zero or more times
    

    交替组(?|允许多个捕获组具有相同的编号:

    (?|
           # case: the target is in the caption attribute #
        caption="      # (you can replace it by caption\s*+=\s*+")
        ([^"]++)       # all that is not a " one or more times (capture group)
        "
        [^]]*+         # all that is not a ] zero or more times
    
      |           # OR
    
           # case: the target is outside the opening tag #
        ]              # square bracket close the opening tag
        ([^[]+)        # all that is not a [ 1 or more times (capture group)
    )
    

    两个捕获现在具有相同的编号 #1

    注意:如果您确定每个标题标签不在多行上,您可以在模式末尾添加 m 修饰符。

    注意2:所有量词都是possessive,当可能快速失败和更好的性能时,我使用atomic groups

    【讨论】:

      【解决方案2】:

      提示(本身不是答案)

      你最好的行动方法是:

      1. 匹配caption之后的所有内容。

        preg_match("#\[caption(.*?)\]#", $q, $match)
        
      2. 使用分解函数提取$match[1] 中的值(如果有)。

        explode(' ', trim($match[1]))
        
      3. 检查返回的数组中的值,并在您的代码中相应地使用。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2011-07-11
        • 1970-01-01
        • 1970-01-01
        • 2011-08-24
        • 1970-01-01
        相关资源
        最近更新 更多