【问题标题】:Finding multiple strings in text with regexp Python使用正则表达式 Python 在文本中查找多个字符串
【发布时间】:2015-06-21 01:24:13
【问题描述】:

我有以下字符串:

background:url('http://images.bloomingdales.com/is/image/BLM/?&$b=BLM/swatches/&layer=0&size=322,23&src=is{$b$1/optimized/8757901_fpx.tif}&cropN=0,0,14,1&anchor=0,0&layer=1&size=23,23&src=is{$b$2/optimized/8757902_fpx.tif}&anchor=0,0&posN=0.071,0&layer=2&size=23,23&src=is{$b$4/optimized/8234544_fpx.tif}&anchor=0,0&posN=0.143,0&layer=3&size=23,23&src=is{$b$7/optimized/1111977_fpx.tif}&anchor=0,0&posN=0.214,0&layer=4&size=23,23&src=is{$b$0/optimized/8538460_fpx.tif}&anchor=0,0&posN=0.286,0&layer=5&size=23,23&src=is{$b$5/optimized/8234545_fpx.tif}&anchor=0,0&posN=0.357,0&layer=6&size=23,23&src=is{$b$3/optimized/1111973_fpx.tif}&anchor=0,0&posN=0.429,0&layer=7&size=23,23&src=is{$b$7/optimized/1252857_fpx.tif}&anchor=0,0&posN=0.5,0&layer=8&size=23,23&src=is{$b$8/optimized/1252858_fpx.tif}&anchor=0,0&posN=0.571,0&layer=9&size=23,23&src=is{$b$7/optimized/8234547_fpx.tif}&anchor=0,0&posN=0.643,0&layer=10&size=23,23&src=is{$b$0/optimized/8757900_fpx.tif}&anchor=0,0&posN=0.714,0&layer=11&size=23,23&src=is{$b$0/optimized/1111970_fpx.tif}&anchor=0,0&posN=0.786,0&layer=12&size=23,23&src=is{$b$1/optimized/1111971_fpx.tif}&anchor=0,0&posN=0.857,0&layer=13&size=23,23&src=is{$b$2/optimized/1111972_fpx.tif}&anchor=0,0&posN=0.929,0&layer=14&op_sharpen=1&fmt=jpeg&qlt=90,0&hei=23') 322px 0 transparent;

我需要得到所有这些部分:

1/optimized/8757901_fpx.tif2/optimized/8757902_fpx.tif等。

我正在使用这个正则表达式:

re.findall(re.compile(r'\d{1,2}/optimized/.+\.tif'), swatch)

返回错误结果:

['1/optimized/8757901_fpx.tif}&cropN=0,0,14,1&anchor=0,0&layer=1&size=23,23&src=is{$b$2/optimized/8757902_fpx.tif}&anchor=0,0&posN=0.071,0&layer=2&size=23,23&src=is{$b$4/optimized/8234544_fpx.tif}&anchor=0,0&posN=0.143,0&layer=3&size=23,23&src=is{$b$7/optimized/1111977_fpx.tif}&anchor=0,0&posN=0.214,0&layer=4&size=23,23&src=is{$b$0/optimized/8538460_fpx.tif}&anchor=0,0&posN=0.286,0&layer=5&size=23,23&src=is{$b$5/optimized/8234545_fpx.tif}&anchor=0,0&posN=0.357,0&layer=6&size=23,23&src=is{$b$3/optimized/1111973_fpx.tif}&anchor=0,0&posN=0.429,0&layer=7&size=23,23&src=is{$b$7/optimized/1252857_fpx.tif}&anchor=0,0&posN=0.5,0&layer=8&size=23,23&src=is{$b$8/optimized/1252858_fpx.tif}&anchor=0,0&posN=0.571,0&layer=9&size=23,23&src=is{$b$7/optimized/8234547_fpx.tif}&anchor=0,0&posN=0.643,0&layer=10&size=23,23&src=is{$b$0/optimized/8757900_fpx.tif}&anchor=0,0&posN=0.714,0&layer=11&size=23,23&src=is{$b$0/optimized/1111970_fpx.tif}&anchor=0,0&posN=0.786,0&layer=12&size=23,23&src=is{$b$1/optimized/1111971_fpx.tif}&anchor=0,0&posN=0.857,0&layer=13&size=23,23&src=is{$b$2/optimized/1111972_fpx.tif']

我在 regex101.com 上测试了这个正则表达式,它运行良好: https://regex101.com/r/tV9kU8/1#

【问题讨论】:

    标签: python regex parsing


    【解决方案1】:
    re.findall(r'\d{1,2}/optimized/.+?\.tif', swatch)
    
                                                ^^
    

    通过将? 附加到您的quanitifer 使其不贪婪。

    【讨论】:

      【解决方案2】:

      不要使用贪婪的.+,而是在非贪婪模式下使用量词:.+?。 这样,您的正则表达式将永远不会匹配 /.tif 之间超出需要的字符,即它只会匹配到 .tif 的下一个实例。

      【讨论】:

        【解决方案3】:

        您可以在您的正则表达式中使用none greedy grouping请注意,在您的模式中,您还需要在+ 之后放置一个? 以使其成为none greedy):

        >>> re.findall(re.compile(r'{\$b\$(.*?)}'), s)
        ['1/optimized/8757901_fpx.tif', '2/optimized/8757902_fpx.tif', 
        '4/optimized/8234544_fpx.tif', '7/optimized/1111977_fpx.tif', 
        '0/optimized/8538460_fpx.tif', '5/optimized/8234545_fpx.tif', 
        '3/optimized/1111973_fpx.tif', '7/optimized/1252857_fpx.tif', 
        '8/optimized/1252858_fpx.tif', '7/optimized/8234547_fpx.tif', 
        '0/optimized/8757900_fpx.tif', '0/optimized/1111970_fpx.tif', 
        '1/optimized/1111971_fpx.tif', '2/optimized/1111972_fpx.tif']
        

        由于你们所有人的图像路径都在\$b\$ 之后,您可以使用以下模式:

        {\$b\$(.*?)}
        

        这将匹配 {}\$b\$ 之后的任何内容。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-09-17
          • 1970-01-01
          • 1970-01-01
          • 2016-05-16
          • 1970-01-01
          • 2014-07-28
          相关资源
          最近更新 更多