【问题标题】:Get content between html tags with identifier inside获取带有标识符的html标签之间的内容
【发布时间】:2016-09-17 10:17:20
【问题描述】:

我有这些跨度标签:

<div>
<span style="background: url('/wp-content/themes/minimum-child/img/address.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
<span style="background: url('/wp-content/themes/minimum-child/img/email.png') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
<span style="background: url('/wp-content/themes/minimum-child/img/tel.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>

我需要在跨度之间获取内容,但我需要将内容分隔为单个变量$address$email$phone$web 等。很明显,我可以使用背景图像的名称作为模式,因为图像的名称仍然相同(address.png、email.png 等)。

到目前为止,我认为有必要使用preg_match_all功能,我已经尝试过,但到目前为止我没有成功。

我尝试过(用于获取$address 变量的地址):

$url="'/wp-content/themes/minimum-child/img/address.png'";
$tag='span style="background: url('.$url.')';
$matches=array();
$pattern = "/<$tag ?.*>(.*)<\/span>/";
preg_match($pattern, $htmlcontent, $matches);
$address=$matches[1];

不幸的是,它不起作用。你知道如何实现它吗?

【问题讨论】:

  • 您要捕获的是SPAN 内容(即:CONTENT_1,CONTENT_2 等)还是style 属性('address.png' 等)?
  • 嗨,我需要捕获 CONTENT_1 等。模式应该是例如 address.png

标签: php html text preg-match-all


【解决方案1】:

人们常说,用正则表达式解析 html 充满了问题——所以我会选择使用DOMDocument 来帮助处理 html 片段的更简单的方法——然后你可以使用正则表达式来进一步细化一些如果需要,可能会得到结果。

$html='
<div>
    <span style="background: url(\'/wp-content/themes/minimum-child/img/address.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
    <span style="background: url(\'/wp-content/themes/minimum-child/img/email.png\') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
    <span style="background: url(\'/wp-content/themes/minimum-child/img/tel.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>';


$dom=new DOMDocument;
$dom->loadHTML( $html );

$col=$dom->getElementsByTagName('span');
$keep=array(
    'style'=>array(),
    'data' =>array(),
    'email'=>array()
);

foreach( $col as $node ){
    $keep['style'][]=str_replace( "'", "", $node->getAttribute('style') );
    $keep['data'][]=$node->nodeValue;
    if( $node->hasChildNodes() ){
        foreach( $node->childNodes as $child ){
            if( $child->nodeType==XML_ELEMENT_NODE && $child->hasAttribute('href') ) {
                list($mailto,$address)=explode(':',$child->getAttribute('href') );
                $keep['email'][]=$address;
            }
        }
    }
}
echo '<pre>',print_r($keep,true),'</pre>';


/* output
   ------

    Array
    (
        [style] => Array
            (
                [0] => background: url(/wp-content/themes/minimum-child/img/address.png) 0px 2px no-repeat; padding-left: 20px;
                [1] => background: url(/wp-content/themes/minimum-child/img/email.png) 0px 2px no-repeat; padding-left: 20px;
                [2] => background: url(/wp-content/themes/minimum-child/img/tel.png) 0px 2px no-repeat; padding-left: 20px;
            )

        [data] => Array
            (
                [0] => CONTENT 1
                [1] => CONTENT 2
                [2] => CONTENT 3
            )

        [email] => Array
            (
                [0] => post@post.com
            )

    )
*/

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-07-08
    • 2015-07-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-05-31
    相关资源
    最近更新 更多