PHP 无法读取媒体：内容属性答案

【问题标题】：PHP Can't read media:content attributePHP 无法读取媒体：内容属性
【发布时间】：2021-01-14 00:40:17
【问题描述】：

我使用以下 PHP 代码将 RSS 提要解析为 HTML：

function get_rss_feed_as_html($feed_url, $max_item_cnt = 10, $show_date = true, $show_description = true, $max_words = 0, $cache_timeout = 7200, $cache_prefix = "/tmp/rss2html-")
    {
    $result = "";
    $rss = new DOMDocument();
    $cache_file = $cache_prefix . md5($feed_url);

    if ($cache_timeout > 0 &&
        is_file($cache_file) &&
        (filemtime($cache_file) + $cache_timeout > time())) {
            $rss->load($cache_file);
    } else {
        $rss->load($feed_url);
        if ($cache_timeout > 0) {
            $rss->save($cache_file);
        }
    }

    $feed = array();
    foreach ($rss->getElementsByTagName('entry') as $node) {
        
        $item = array (
            'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
            'desc' => $node->getElementsByTagName('content ')->item(0)->nodeValue,
            'content' => $node->getElementsByTagName('content')->item(0)->nodeValue,
            'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href'),
            'date' => $node->getElementsByTagName('updated')->item(0)->nodeValue,
            'media' => $node->getElementsByTagName('media:content')->item(0)->getAttribute('url'),
        );
        $content = $node->getElementsByTagName('encoded');
        if ($content->length > 0) {
            $item['content'] = $content->item(0)->nodeValue;
        }
        array_push($feed, $item);
    }

    if ($max_item_cnt > count($feed)) {
        $max_item_cnt = count($feed);
    }
    $result .= '<div class="bw-feedly-list">';
    for ($x=0;$x<$max_item_cnt;$x++) {
        $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
        $link = $feed[$x]['link'];
        $result .= '<div class="bw-feedly-item-col">';
        $result .= '<a class="bw-feedly-item" href="'.$link.'" title="'.$title.'" target="_blank">';
        if ($show_date) {
            $date = date('F d, Y', strtotime($feed[$x]['date']));
            $result .= '<div class="bw-feedly-date">'.$date.'</div>';
        }
        
        $result .= '<strong class="bw-feedly-title">'.$title.'</strong>';
        
        if ($show_description) {
            $result .= '<div class="bw-feedly-row">';
            $result .= '<div class="bw-feedly-summary-col">';
            
            $description = $feed[$x]['content'];
            $content = $feed[$x]['content'];

            // no html tags
            $description = strip_tags(preg_replace('/(<(script|style)\b[^>]*>).*?(<\/\2>)/s', "$1$3", $description), '');
            // whether cut by number of words
            if ($max_words > 0) {
                $arr = explode(' ', $description);
                if ($max_words < count($arr)) {
                    $description = '';
                    $w_cnt = 0;
                    foreach($arr as $w) {
                        $description .= $w . ' ';
                        $w_cnt = $w_cnt + 1;
                        if ($w_cnt == $max_words) {
                            break;
                        }
                    }
                    $description .= " ...";
                }
            }
            
            $result .= '<div class="feed-description">' . $description . '</div>';
            
            $media = $feed[$x]['media'];
            
            // add img if it exists
            //if ($media !== '') {
                $result .= '<div class="bw-feedly-image-col">';
                $result .= '<div class="bw-feedly-image-wrap" style="background-image: url('. $media .');">';
                $result .= '<img class="bw-feedly-image" src="'. $media .'">';
                $result .= '</div></div>';
            //}
            
            $result .= '</div></div>';
        }
        $result .= '</div>';
    }
    $result .= '</a></div>';
    return $result;
}

它工作正常，除了检索正确的媒体（URL）属性：

'media' => $node->getElementsByTagName('media:content')->item(0)->getAttribute('url'),

出现以下错误：致命错误：未捕获的错误：调用成员函数 getAttribute() on null in

在这里我可以毫无问题地访问该属性..

'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href')

并非 XML 提要中的所有条目都有媒体元素，但任何 null 检查都不会改变任何事情。

我也尝试了这段代码，我想我很接近，但仍然没有成功。它打印所有条目'内容为空'.. ????

 if($node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
        $image = $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
    } else {
    
        echo '<p>content is null</p>';
    }

xPath 表达式对我也没有帮助。

$xpath = new DOMXpath($rss);
$xpath->registerNamespace('m', 'http://search.yahoo.com/mrss/');

foreach ($xpath->evaluate('//entry') as $item) 
{
    $media = $xpath->evaluate('string(m:content/@url)', $item);
    echo '<p> MEDIA ITEM: '.$media.'</p>';
}

这里是 XML 的一部分。

    <entry>
     <id>tag:04ac51c7-b707-43cc-8a73-c482da986a27</id>
     <title type="html">Lorem Ipsum</title>
     <published>2020-09-28T19:36:26Z</published>
     <updated>2020-09-28T06:01:22Z</updated>
     <link rel="alternate" href="https://www.lipsum.com/" type="text/html"/>
     <content type="html">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. ...</content>
     <author>
     <name/>
     </author>
     <media:content medium="image" url="https://picsum.photos/200/300"/>
     <source>
     <id>tag:04ac51c7-b707-43cc-8a73-c482da986a27</id>
     <title type="html">Lorum ipsum</title>
     <link rel="alternate" type="text/html" href="https://www.lipsum.com/"/>
     <updated>2020-09-28T06:01:22Z</updated>
     </source>
    </entry>
    <entry>

这里有什么诀窍？

【问题讨论】：

标签： php rss

【解决方案1】：

它应该与 getElementsByTagNameNS 函数一起使用。

您应该能够在没有命名空间标记的情况下使用 getElementsByTagName。所以省略“媒体”。

$node->getElementsByTagName('content')->item(0)->getAttribute('url')

如果您有多个包含内容的命名空间，这将发生冲突。

【讨论】：

我也是这么认为的，但在这种情况下，“在 null 上调用成员函数 getAttribute()”也会出现错误。 :(（谢谢你的回复）
顺便说一句，我在 XML 中有另一个“内容”元素。所以这行不通，我将添加 XML 提要的一部分。

【解决方案2】：

我已经完成了所有工作，希望它可以帮助其他人。

    $image = '';
    if($node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
        $image = $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
    }

【讨论】：