【问题标题】:PHP preg-match-all capturePHP preg-match-all 捕获
【发布时间】:2013-01-13 18:37:48
【问题描述】:

我想用 PHP 中的 preg_match_all 在他们自己的组中捕获每一个:

  1. 章节、章节或页面
  2. 指定章、节或页的编号(或字母,如果有的话)。如果它们之间有一个空格,则应考虑到这一点
  3. 单词“and”、“or”

请记住,我想忽略所有书名并且字符串中的项目数可能是动态的,正则表达式应该适用于以下所有示例:

  1. Ch1 和 Sect2b
  2. Ch 4 x 不需要的标题和 Sect 5y 不需要的标题和 Sect6 z 和 Ch7 或 Ch8

这是我迄今为止想出的:

    $str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
    preg_match_all ('/([a-z]+)(?=\d|\d\s)\s*(\d*)\s*(?<=\d|\d\s)([a-z]?).*?(and|or)?/i', $str, $matches);

    Array
    (
        [0] => Array
            (
                [0] => Pg3
            )

        [1] => Array
            (
                [0] => Pg
            )

        [2] => Array
            (
                [0] => 3
            )

        [3] => Array
            (
                [0] => 
            )

        [4] => Array
            (
                [0] => 
            )

    )

预期的结果应该是:

    Array
    (
        [0] => Array
            (
                [0] => Ch 1 a and 
                [1] => Sect 2b and 
                [2] => Pg3
            )

        [1] => Array
            (
                [0] => Ch
                [1] => Sect
                [2] => Pg
            )

        [2] => Array
            (
                [0] => 1
                [1] => 2
                [2] => 3
            )

        [3] => Array
            (
                [0] => a
                [1] => b
                [2] => 
            )

        [4] => Array
            (
                [0] => and
                [1] => and
                [2] => 
            )

    )

【问题讨论】:

  • 不确定您是否真的想使用 one 正则表达式来执行此操作。使用几个看起来更好。
  • @fge 我怎样才能在使用多个正则表达式的同时仍保持一切正常?如果你有一个例子,将不胜感激。谢谢。
  • 不是 PHP 的,我几乎不知道...

标签: php regex preg-match-all


【解决方案1】:

这是我能得到的最接近的:

$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/((Ch|Sect|Pg)\s?(\d+)\s?(\w?))(.*?(and|or))?/i', $str, $matches);


Array
(
    [0] => Array
        (
            [0] => Ch 1 a unwantedtitle and
            [1] => Sect 2b unwanted title and
            [2] => Pg3
        )

    [1] => Array
        (
            [0] => Ch 1 a
            [1] => Sect 2b
            [2] => Pg3
        )

    [2] => Array
        (
            [0] => Ch
            [1] => Sect
            [2] => Pg
        )

    [3] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

    [4] => Array
        (
            [0] => a
            [1] => b
            [2] => 
        )

    [5] => Array
        (
            [0] =>  unwantedtitle and
            [1] =>  unwanted title and
            [2] => 
        )

    [6] => Array
        (
            [0] => and
            [1] => and
            [2] => 
        )

)

【讨论】:

    【解决方案2】:

    我会这样做。

    $arr = array(
        'Ch1 and Sect2b',
        'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3',
        'Ch 4 x unwantedtitle and Sect 5y unwanted title and' .
            ' Sect6 z and Ch7 or Ch8a',
        'Assume this is ch1a and ch 2 or ch seCt 5c.' .
            ' Then SECT or chA pg22a and pg 13 andor'
    );
    
    foreach ($arr as $a) {
        var_dump($a);
        preg_match_all(
        '~
            \b(?P<word>ch|sect|(pg))
            \s*(?P<number>\d+)
            (?(2)\b|
                \s*
                (?P<letter>(?!(?<=\s)(?:and|or)\b)[a-z]+)?
                \s*
                (?:(?<=\s)(?P<cond>and|or)\b)?
            )
        ~xi'
        ,$a,$m);
        foreach ($m as $k => $v) {
            if (is_numeric($k) && $k !== 0) unset($m[$k]);
            // this is for 'beautifying' the result array
            // note that $m[0] will still return whole matches
        }
        print_r($m);
    }
    

    我不得不将pg 变成一个捕获组,因为我需要为此明确编写一个条件,也就是说,它可以附加一个数字(中间有或没有空格),但不能附加任何字母考虑到页面指示器不会像“pg23a”那样有字母。

    这就是为什么我选择命名每个组并通过代码中的内部 foreach 循环“美化”结果。否则,如果您选择使用数字索引(而不是命名索引),则需要跳过每个 $m[2]

    为了显示一个例子,这里是$arr 中最后一项的输出。

    Array
    (
        [0] => Array
            (
                [0] => ch1a and
                [1] => ch 2 or
                [2] => seCt 5c
                [3] => pg 13
            )
    
        [word] => Array
            (
                [0] => ch
                [1] => ch
                [2] => seCt
                [3] => pg
            )
    
        [number] => Array
            (
                [0] => 1
                [1] => 2
                [2] => 5
                [3] => 13
            )
    
        [letter] => Array
            (
                [0] => a
                [1] => 
                [2] => c
                [3] => 
            )
    
        [cond] => Array
            (
                [0] => and
                [1] => or
                [2] => 
                [3] => 
            )
    
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-02-28
      • 2015-08-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多