【问题标题】:not able to parse rss feeds无法解析 RSS 提要
【发布时间】:2013-04-16 08:15:37
【问题描述】:

我正在尝试使用 python 中的 feedparser 解析来自 url 的 RSS 提要。

>>> import feedparser 
>>> d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801')
>>> d
{'feed': {'summary': u'<span><h1>Server Error in \'/mobile\' Application.<hr color="silver" size="1" width="100%" /></h1>\n\n            
<h2> <i>Attempted to divide by zero.</i> </h2></span>\n\n            <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">\n\n            <b> Description: </b>An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.\n\n            <br /><br />\n\n            <b> Exception Details: </b>System.DivideByZeroException: Attempted to divide by zero.<br /><br />\n\n            
<b>Source Error:</b> <br /><br />\n\n            <table bgcolor="#ffffcc" width="100%">\n               <tr>\n                  <td>\n                      <code>\n\nAn unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.</code>\n\n                  </td>\n               </tr>\n            </table>\n\n            <br />\n\n            <b>Stack Trace:</b> <br /><br />\n\n            <table bgcolor="#ffffcc" width="100%">\n               <tr>\n                  <td>\n                      <code><pre>\n\n[DivideByZeroException: Attempted to divide by zero.]\n   System.Decimal.FCallDivide(Decimal&amp; d1, Decimal&amp; d2) +0\n   System.Decimal.Divide(Decimal d1, Decimal d2) +17\n   Martjack.CMS.PageControlsModelComp.GetPluginDataEnt(PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, PageControlModel&amp; objPageControlModel, ProductEnt_RE ProductEnt, String MobileVersion) +2324\n   
Martjack.CMS.PageControlsModelComp.GetPageControlOutputData(PageModel pagemodel, PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, ProductEnt_RE ProductEnt, String siteurl) +694\n   Martjack.CMS.PageControlsModelComp.GetPageControlModels(PageModel Pagemodel, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, DNDPageControlViewCollection objDNDPageControlViewCollection, Boolean isdndrequest, Int64 pgcontrolid, String siteurl) +919\n   Martjack.CMS.PageModelComp.GetPageModel(MerchantENT MerchantEnt, Int32 predefinedPageId, Boolean isPredefined, ChannelType channel, String seocid, String Bid, String combiType, String MobileVersion, Boolean isDndRequest, 
DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +1717\n   MartJack.Facade.CMSFacade.GetPageModel(MerchantENT MerchantEnt, Int32 PageId, Boolean isPredefined, ChannelType channel, String seocid, String bid, String combitype, String mobileversion, Boolean isDndRequest, DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +119\n   MobileECommerce.MobileECommerce.ProductsController.GetPageModelByRequest(String seoid, String bid) +227\n   MobileECommerce.MobileECommerce.ProductsController.Index(String id, String seobrand, String category, String categoryparent) +54\n   lambda_method(Closure , ControllerBase , Object[] ) +272\n   
System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters) +17\n   System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +212\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +239\n   System.Web.Mvc.&lt;&gt;c__DisplayClass15.&lt;InvokeActionMethodWithFilters&gt;b__12() +56\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation) +282\n   System.Web.Mvc.&lt;&gt;c__DisplayClass17.&lt;InvokeActionMethodWithFilters&gt;b__14() +20\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +201\n   System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) +351\n   System.Web.Mvc.Controller.ExecuteCore() +99\n   System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) +94\n   System.Web.Mvc.ControllerBase.System.Web.Mvc.IController.Execute(RequestContext requestContext) +10\n   
System.Web.Mvc.&lt;&gt;c__DisplayClassb.&lt;BeginProcessRequest&gt;b__5() +43\n   System.Web.Mvc.Async.&lt;&gt;c__DisplayClass1.&lt;MakeVoidDelegate&gt;b__0() +21\n   System.Web.Mvc.Async.&lt;&gt;c__DisplayClass8`1.&lt;BeginSynchronous&gt;b__7(IAsyncResult _) +12\n   System.Web.Mvc.Async.WrappedAsyncResult`1.End() +53\n   System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +28\n   System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +15\n   System.Web.Mvc.&lt;&gt;c__DisplayClasse.&lt;EndProcessRequest&gt;b__d() +34\n   System.Web.Mvc.SecurityUtil.&lt;GetCallInAppTrustThunk&gt;b__0(Action f) +7\n   System.Web.Mvc.SecurityUtil.ProcessInApplicationTrust(Action action) +23\n   System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +68\n   
System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +9\n   System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +714\n   System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously) +240\n</pre></code>\n\n                  </td>\n               </tr>\n            </table>\n\n            <br />\n\n            
<hr color="silver" size="1" width="100%" />\n\n            <b>Version Information:</b>\xa0Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.272\n\n            </font>'}, 'status': 302, 'version': u'', 'encoding': u'utf-8', 'bozo': 1, 'headers': {'content-length': '11348', 'x-powered-by': 'ASP.NET', 'set-cookie': 'SERVERID=HAS14; path=/', 'originserver': 'HAS14', 'server': 'Microsoft-IIS/7.5', 'connection': 'close', 'cache-control': 'private', 'date': 'Tue, 16 Apr 2013 08:03:59 GMT', 'content-type': 'text/html; charset=utf-8', 'x-aspnet-version': '4.0.30319'}, 'href': 
u'http://www.shop.inonit.in/mobile/Products//NA/NA/0', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('not well-formed (invalid token)',)}

我在输出中一无所获,而如果您转到链接 (http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801),它会显示很多东西! 也许它会将我重定向到其他不存在的页面(因为我尝试使用 scrapy 抓取该网站的各个页面,但由于我被重定向到了一些不存在的 url,所以不能)。

对此的任何帮助都会很棒。谢谢!

【问题讨论】:

  • “输出中没有任何内容”是什么意思? &gt;&gt;&gt; len(d['feed']['summary']) 5601,那里有一个很好的“除以零”消息。 `
  • 对不起,我的意思是没有任何相关性,如在元素(标题、价格等)中,显然它无法读取提要,但如果你打开链接,你会看到所有数据

标签: rss feed scrapy feedparser


【解决方案1】:

你在使用代理吗? 如果你是,就这样做吧-

import urllib2, feedparser
proxy = urllib2.ProxyHandler({"http":"proxy:port"})
d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801', handlers = [proxy])

【讨论】:

    猜你喜欢
    • 2023-04-05
    • 2016-12-05
    • 1970-01-01
    • 2014-08-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多