【问题标题】:Unable to get the contents of a webpage无法获取网页内容
【发布时间】:2021-04-02 09:00:42
【问题描述】:

我正在编写一个脚本并需要例如这个网页的内容:

https://pcb.inc.hp.com/webapp/#/nl-nl/contents/33128146?type=I&hierarchy=F&status=L&status=O

我正在使用 scrapy,通常一切正常,但我目前无法使用 Requests 或 scrapy 或任何其他模块获取此页面的 html。有人知道可能出了什么问题吗?

【问题讨论】:

    标签: get scrapy python-requests-html


    【解决方案1】:

    一些网站使用动态加载数据的 Javascript。

    对于这些情况,我们使用ScrapySplash,它使用无头浏览器为您加载。

    查看文档here

    【讨论】:

      【解决方案2】:

      网站使用 AngularJS 在加载时动态生成内容。你不能直接从这个网站上抓取内容,我建议使用类似 Selenium 的 Python 来抓取数据。

      或者相反,根据您的需要,您可以查看Chrome Dev Tools 中的Network 选项卡以查看正在发出的请求,并从这些 URL 中抓取数据。

      例如

      Request URL: https://pcb.inc.hp.com/api/catalogs/nl-nl/nodes/0/children?status[]=O&status[]=L&hierParadigm=F
      
      Response: {"baseProdname":"ROOT_NODE","oid":0,"level":0,"status":["O","L"],"cultureCode":"nl-nl","children":[{"baseProdname":"Solutions","oid":8176594,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Scanners/Copiers/Faxes","oid":15179,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Software","oid":8133386,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Ink/Toner/Paper/Printer Supplies","oid":12771,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Laptops and Hybrids","oid":321957,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Printers and Multifunction","oid":18972,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Point of Sale Systems","oid":7491307,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Desktops & Workstations","oid":12454,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Monitors","oid":382087,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Services","oid":8362107,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Accessories","oid":8386448,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"3D Materials and Consumables","oid":20063457,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Handhelds and Calculators","oid":215348,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Industries","oid":20008722,"level":1,"status":["L"],"cultureCode":"nl-nl"},{"baseProdname":"Tablets","oid":5169094,"level":1,"status":["O"],"cultureCode":"nl-nl"},{"baseProdname":"Projectors","oid":3338965,"level":1,"status":["O"],"cultureCode":"nl-nl"},{"baseProdname":"Digital Cameras and Photo Studios","oid":382085,"level":1,"status":["O"],"cultureCode":"nl-nl"}]}
      

      【讨论】:

        猜你喜欢
        • 2010-10-23
        • 2015-09-14
        • 1970-01-01
        • 2018-12-16
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-10-02
        相关资源
        最近更新 更多