【发布时间】:2019-05-06 01:23:39
【问题描述】:
我正在尝试从 php 中的许多图像字符串中获取 data-src 和 data-srcset 属性。这两个属性都是可选的,这意味着可以有零,只有data-src,只有data-srcset,或者两者都有。我的正则表达式是
<img(.*?)data-src=['\"](.*?)['\"].*?|(data-srcset=['\"](.*?)['\"])?\/>
我正在测试的字符串是:
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
</figure>
</li>
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
</figure>
</li>
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
</figure>
</li>
但是太贪心了。看这里:
https://regex101.com/r/vDQE3C/1
非常感谢任何帮助(也是合乎逻辑的)。
【问题讨论】:
-
试试
data-src(?:set)?=.[^'"]+。在此处查看现场演示 regex101.com/r/qJMl5G/1 -
我建议您避免为此目的使用正则表达式(这不太适合 xHTML 解析)。对于一些 HTML 解析,我使用PHP Simple HTML DOM Parser
-
DOMDocument 是 PHP 的一部分,也比正则表达式安全得多。
-
<img src="somePath" /> <span data-src="oops this shouldn't be there, but who knows...">Hello world</span><img src="someOtherPath" /> -
那就是HTML解析,你不应该使用正则表达式。