在电子中从网站上刮下 html 标签答案

【问题标题】：scraping html tags off website in electron在电子中从网站上刮下 html 标签
【发布时间】：2017-01-10 01:26:21
【问题描述】：

有没有办法在电子中抓取网站。我的目标是能够访问一个网站并抓取 html 标签。我正在使用 Windows 机器，所以我启动了应用程序（npm start index.html）。我的想法是创建一个 .js 文件并像在 node 中一样使用 require (Url, function(err, resp,html){ }) ，但这在电子中不起作用。此代码无法抓取页面并进行回调。我只想要html。我怎样才能完成这项工作？我回调的 app.js 文件中的代码是。

function scrape(callback){ 

    var content = [];
    var request = require('request');
    var cheerio = require('cheerio');
    var url = "http://www.amazon.com";



    request(url, function(error, response, html){

       if (error){

          content.push('Error:', error);
       }
       if (response.statusCode !== 200) {

          content.push('Invalid Status Code Returned:', response.statusCode);
     }

      content.push(html);
      var $ = cheerio.load(html);


    $('td').each(function (i, element) {


        var a = $(this).prev();
        var trimmed_a = a.text();

        trimmed_a = trimmed_a.trim();
        var str = trimmed_a.replace(/\s\s+/g,"");
        var newStr = str.trim();

       content.push(newStr);

    });


 })
  callback(content);
}    

module.exports = scrape;

回调工作正常，但代码未执行。有很多我不明白的，所以请随意建设性地指导。目标是能够用这个抓取任何网站。

【问题讨论】：

标签： javascript node.js electron

【解决方案1】：

对于使用电子抓取网站，我建议您使用NightmareJs。

npm 安装噩梦

    var Nightmare = require('nightmare');
    var nightmare = Nightmare({ show: true });

    nightmare
        .goto('https://duckduckgo.com')
        .type('#search_form_input_homepage', 'github nightmare')
        .click('#search_button_homepage')
        .wait('#zero_click_wrapper .c-info__title a')
        .evaluate(function() {
            return document.querySelector('#zero_click_wrapper .c-info__title a').href;
        })
        .end()
        .then(function(result) {
            console.log(result);
        })
        .catch(function(error) {
            console.error('Search failed:', error);
        });

【讨论】：