【问题标题】:Powershell - how do you parse out content:encoded from RSS feed (XML)?Powershell - 你如何解析内容:从 RSS 提要(XML)编码?
【发布时间】:2017-05-17 08:17:48
【问题描述】:

我正在尝试使用 powershell 从 RSS 提要中解析数据。

如何获取 title、guid 和 content:encoded 字段的内容?

出于某种原因,我下面的代码只返回“...”。

非常感谢任何帮助!

[xml]$hsg = Invoke-WebRequest http://technet.microsoft.com/en-us/security/rss/comprehensive
#$hsg.rss.channel.item | select title #this prints the list of blog posts

$ContentNamespace = New-Object Xml.XmlNamespaceManager $hsg.NameTable 
$ContentNamespace.AddNamespace("content", "http://purl.org/rss/1.0/modules/content/")

#$hsg.rss.channel.item #this prints the list of posts

$hsg.rss.channel.item.selectSingleNode("content:encoded", $ContentNamespace)

数据如下:

<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rssdatehelper="urn:rssdatehelper" version="2.0">
<channel>
<title>Microsoft Security Content: Comprehensive Edition</title>
<link>http://technet.microsoft.com/security/bulletin</link>
<dc:date>Wed, 15 May 2013 08:00:00 GMT</dc:date>
<generator>umbraco</generator>
<description>Microsoft Security Content: Comprehensive Edition</description>
<language>en-US</language>
<item>
<title>
MS13-045 - Important : Vulnerability in Windows Essentials Could Allow Information Disclosure (2813707) - Version: 1.1
</title>
<link>
http://technet.microsoft.com/en-us/security/bulletin/ms13-045
</link>
<dc:date>2013-05-15T07:00:00.0000000Z</dc:date>
<guid>
http://technet.microsoft.com/en-us/security/bulletin/ms13-045
</guid>
<content:encoded>
<![CDATA[
Severity Rating: Important<br />
 Revision Note: V1.1 (May 15, 2013): Corrected link to the download location in the Detection and Deployment Tools and Guidance section. This is an informational change only.<br />
 Summary: This security update resolves a privately reported vulnerability in Windows Writer. The vulnerability could allow information disclosure if a user opens Writer using a specially crafted URL. An attacker who successfully exploited the vulnerability could override Windows Writer proxy settings and overwrite files accessible to the user on the target system. In a web-based attack scenario, a website could contain a specially crafted link that is used to exploit this vulnerability. An attacker would have to convince users to visit the website and open the specially crafted link.
]]>
</content:encoded>
</item>

谢谢!

【问题讨论】:

    标签: xml powershell rss


    【解决方案1】:

    试试这个:

    $rss = [xml](Get-Content .\test.rss)
    $rss.SelectNodes('//item') | % {
        $posts += New-Object psobject -Property @{
            Title = $_.Title.Trim()
            Guid = $_.Guid.Trim()
            Content = $_.Encoded."#cdata-section".Trim()
        }
    }
    

    已解析数据样本(该数组仅包含一项,因为您的样本中只有一项):

    $posts
    
    Title                             Guid                             Content                         
    -----                             ----                             -------                         
    MS13-045 - Important : Vulnera... http://technet.microsoft.com/... Severity Rating: Important<br...
    

    顺便说一句,您的样本最终缺少以下内容:

    </channel>
    </rss>
    

    【讨论】:

    • 做到了!谢谢,@Graimer!
    【解决方案2】:

    您可以绕过将输出写入中间文件,跳过 get-content。 Frode 将它很好地打包成一个 psobject,它提供了解决方案。

    cls
    $x=[xml](iwr 'https://technet.microsoft.com/en-us/security/rss/comprehensive').content
    foreach ($y in $x.rss.channel.selectnodes('//item')) {
    "`r`n`t$($y.title)"
    
    $y.pubdate
    $y.link
    $y.encoded.'#cdata-section'
    
    }
    

    您可能会发现您的 rss/atom 返回的结构略有不同,我发现这对于不同的提要是必要的:

    foreach ($y in $x.feed.entry)
    

    IDE 中的智能感知帮助我进行导航。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-06-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多