【问题标题】:Extract data-src and data-srcset from img从 img 中提取 data-src 和 data-srcset
【发布时间】:2019-05-06 01:23:39
【问题描述】:

我正在尝试从 php 中的许多图像字符串中获取 data-srcdata-srcset 属性。这两个属性都是可选的,这意味着可以有零,只有data-src,只有data-srcset,或者两者都有。我的正则表达式是

<img(.*?)data-src=['\"](.*?)['\"].*?|(data-srcset=['\"](.*?)['\"])?\/>

我正在测试的字符串是:

<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>

但是太贪心了。看这里:

https://regex101.com/r/vDQE3C/1

非常感谢任何帮助(也是合乎逻辑的)。

【问题讨论】:

  • 试试data-src(?:set)?=.[^'"]+。在此处查看现场演示 regex101.com/r/qJMl5G/1
  • 我建议您避免为此目的使用正则表达式(这不太适合 xHTML 解析)。对于一些 HTML 解析,我使用PHP Simple HTML DOM Parser
  • DOMDocument 是 PHP 的一部分,也比正则表达式安全得多。
  • &lt;img src="somePath" /&gt; &lt;span data-src="oops this shouldn't be there, but who knows..."&gt;Hello world&lt;/span&gt;&lt;img src="someOtherPath" /&gt;
  • 那就是HTML解析,你不应该使用正则表达式。

标签: php regex


【解决方案1】:

您只需要考虑data-attributes* 和图像结束标签/&gt; 之间的任何内容。你需要另一个(.*?)

&lt;img(.*?)data-src=['\"](.*?)['\"].*?data-srcset=['\"](.*?)['\"](.*?)\/&gt;

如果您只想捕获data-attributes*,请考虑使用非捕获组,如下所示。这样$1$2 变量只包含你想要的数据,而不是整个图像标签。

&lt;img(?:.*?)data-src=['\"](.*?)['\"].*?data-srcset=['\"](.*?)['\"](?:.*?)\/&gt;

【讨论】:

    【解决方案2】:

    不要使用正则表达式来解析 html 代码。最好像这样使用DOM 解析器:

    $html = <<< EOF
    <li class="blocks-gallery-item">
      <figure>
        <img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
      </figure>
    </li>
    <li class="blocks-gallery-item">
      <figure>
        <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
      </figure>
    </li>
    <li class="blocks-gallery-item">
      <figure>
        <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
      </figure>
    </li>
    EOF;
    
    $xpath = new DOMXPath(@DOMDocument::loadHTML($html));
    $images = $xpath->evaluate("//img");
    
    foreach($images as $img){
       if (($el = $img->attributes->getNamedItem('data-src')) != null)
          echo 'data-src=' . $el->nodeValue . "\n";
       if (($el = $img->attributes->getNamedItem('data-srcset')) != null)
          echo 'data-srcset=' . $el->nodeValue . "\n";
    }
    

    输出:

    data-src=http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif
    data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png
    data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w
    data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png
    data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w
    

    【讨论】:

    • 谢谢。我怎么能改变属性?假设data-src 的值到src 的值
    • 只需在 getNamedItem 函数中传递您的属性名称,以便 $img-&gt;attributes-&gt;getNamedItem('src')
    • 我的意思是:将 data-src 的值设置为 src 的值,例如&lt;img src="test" /&gt; 变为 &lt;img data-src="test"&gt;
    • 你可以查看这个答案:stackoverflow.com/questions/11387748/… 基本上你需要在第一个echo之后调用$img-&gt;setAttribute('src', $el-&gt;nodeValue);
    猜你喜欢
    • 2017-12-13
    • 2014-01-09
    • 1970-01-01
    • 2016-06-10
    • 1970-01-01
    • 2017-03-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多