为什么美丽的汤忽略 CDATA答案

【问题标题】：Why Beautiful Soup Ignoring CDATA为什么美丽的汤忽略 CDATA
【发布时间】：2014-10-26 21:43:24
【问题描述】：

我正在为 yahoo 天气 API (python 2.7) 使用 Beautiful Soup：

url = 'http://weather.yahooapis.com/forecastrss?w=2344116'
page=urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

但在此之后，在解析的 url 中，没有任何 CDATA。为什么美丽的汤忽略了这一点？如何防止忽略 CDATA？

在xml中：

<img src="http://l.yimg.com/a/i/us/we/52/11.gif"/>

在解析的页面中：

如您所见，那里的 CDATA 丢失了。

【问题讨论】：

是什么让您认为它忽略了 CDATA 部分？该部分包含在 text. 中
我刚刚运行了你的代码，CDATA 就在那里。
请看我编辑的问题。
感谢您的澄清。在我忽略yweather-part 之前。为什么不使用那个？ BeautifulSoup 可以更容易地解析它。对于 CDATA，您将需要一些正则表达式魔法。

标签： python beautifulsoup

【解决方案1】：

CDATA 部分不被忽略；它只是按照 CDATA 部分的处理方式应该被视为 text:

>>> print soup.select('description:nth-of-type(2)')[0].text

<img src="http://l.yimg.com/a/i/us/we/52/11.gif"/><br />
<b>Current Conditions:</b><br />
Light Rain Shower, 59 F<BR />
<BR /><b>Forecast:</b><BR />
Sun - Rain/Wind. High: 63 Low: 57<br />
Mon - Rain/Wind. High: 60 Low: 53<br />
Tue - PM Showers. High: 58 Low: 55<br />
Wed - Mostly Cloudy. High: 64 Low: 57<br />
Thu - Rain. High: 63 Low: 55<br />
<br />
<a href="http://us.rd.yahoo.com/dailynews/rss/weather/Istanbul__TR/*http://weather.yahoo.com/forecast/TUXX0014_f.html">Full Forecast at Yahoo! Weather</a><BR/><BR/>
(provided by <a href="http://www.weather.com" >The Weather Channel</a>)<br/>

您可以将该部分解析为单独的页面：

>>> description_soup = BeautifulSoup(soup.select('description:nth-of-type(2)')[0].text)
>>> description_soup.img
<img src="http://l.yimg.com/a/i/us/we/52/11.gif"/>

请注意，由于这是您正在解析的 XML 提要，请考虑使用 XML 模式（需要安装 lxml）：

soup = BeautifulSoup(page, 'xml')

或者（更多）更好的是，使用 feedparser 来解析 RSS 提要。

【讨论】：

【解决方案2】：

你为什么这么想要 CDATA？从我可以看到，相同的数据以更加结构化的方式出现在几行之后：

In [28]: soup.findAll('yweather:forecast')
Out[28]: 
[<yweather:forecast day="Sun" date="26 Oct 2014" low="57" high="63" text="Rain/Wind" code="12">
 </yweather:forecast>,
 <yweather:forecast day="Mon" date="27 Oct 2014" low="54" high="61" text="Rain/Wind" code="12">
 </yweather:forecast>,
 <yweather:forecast day="Tue" date="28 Oct 2014" low="56" high="59" text="Rain" code="12">
 </yweather:forecast>,
 <yweather:forecast day="Wed" date="29 Oct 2014" low="57" high="63" text="AM Showers" code="39">
 </yweather:forecast>,
 <yweather:forecast day="Thu" date="30 Oct 2014" low="55" high="62" text="Light Rain" code="11">
 <guid ispermalink="false">TUXX0014_2014_10_30_9_00_EEST</guid>
 </yweather:forecast>]

【讨论】：

你说得对，但是因为老板的要求，我必须使用图像；）