【问题标题】:Getting data from ref Node (JavaScript web scraping)从 ref 节点获取数据(JavaScript 网页抓取)
【发布时间】:2021-10-31 14:58:32
【问题描述】:

我正在使用 Cheerio 抓取几个网站以用于一个项目。此代码从网站获取数据并将其推送到几个不同的数组中。

我遇到的问题是我似乎无法隔离价格信息。这是我的 NodeJS 代码:

// Gets all available keyboards from 
mykeyboard.eu (first page only) //
app.get('/data', (req, res) => {
const MyKeyboardEU ='https://mykeyboard.eu/catalogue/category/mechanical-keyboards_3/?selected_facets=num_in_stock_exact%3A%5B1+TO+%2A%5D';
const MKEUResults = [];
const MKEUThumbs = [];
const MKEUPrice = [];

// Gets in-stock results from mykeyboard.eu //
Axios.get(MyKeyboardEU)
    .then((response) => {
        let $ = cheerio.load(response.data);
        let keyboards = $('.thumbnail')
        let price = $('.price_color').children;

        // Pushes keyboard names + thumbnail links to respective arrays //
        for (var i = 0; i < keyboards.length; i++) {
            MKEUResults.push(keyboards[i].attribs.alt);
            MKEUThumbs.push(keyboards[i].attribs.src);
            console.log(price[i]);
        }

        // Maps array into single object for consuption on frontend //
        let arr = MKEUResults.map((res, idx) => {
            return {'name': res, "img": MKEUThumbs[idx]}
        });

        res.send(arr);
    })
    .catch((err) => res.send(err));
});

这是 console.log(price[i]) 输出的内容:

 <ref *1> [
keeb-finder-server-1  |   Node {
keeb-finder-server-1  |     type: 'text',
keeb-finder-server-1  |     data: '€179.00',
keeb-finder-server-1  |     parent: Node {
keeb-finder-server-1  |       type: 'tag',
keeb-finder-server-1  |       name: 'p',
keeb-finder-server-1  |       namespace: 'http://www.w3.org/1999/xhtml',
keeb-finder-server-1  |       attribs: [Object: null prototype],
keeb-finder-server-1  |       'x-attribsNamespace': [Object: null prototype],
keeb-finder-server-1  |       'x-attribsPrefix': [Object: null prototype],
keeb-finder-server-1  |       children: [Circular *1],
keeb-finder-server-1  |       parent: [Node],
keeb-finder-server-1  |       prev: [Node],
keeb-finder-server-1  |       next: [Node]
keeb-finder-server-1  |     },
keeb-finder-server-1  |     prev: null,
keeb-finder-server-1  |     next: null
keeb-finder-server-1  |   }
keeb-finder-server-1  | ]

为了记录,它会输出一些与网站上不同项目有关的消息。我只想获取所有这些响应的 data 组件。

我确信在阅读文档时我错过了一些相当简单的东西,但我似乎无法让它发挥作用。

【问题讨论】:

    标签: javascript node.js web-scraping cheerio


    【解决方案1】:

    您可以通过以下方式获取价格:

    var prices = $('.price_color').map(function() {return $(this).text().trim();}).toArray();
    console.log('prices', prices);
    

    此外,您可以通过以下方式简化 altsrc 缩略图的代码:

    var thumbnails = $('.thumbnail').map(function() {
      return {'alt':$(this).attr('alt'), 'src':$(this).attr('src')};
    }).toArray();
    console.log('thumbnails', thumbnails);
    

    最后,如果您想在一个循环中获得srcaltprice,那么您可以执行以下操作。它确实简化了代码并使其更易于理解。

    var products = $('.product_pod').map(function() {
      let image = $(this).find('.thumbnail');
      let price = $(this).find('.price_color').text().trim();
      return {'src':image.attr('src'), 'alt':image.attr('alt'), 'price':price};
    }).toArray();
    console.log('products', products);
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-09-19
      • 1970-01-01
      • 2018-03-19
      • 2021-10-15
      相关资源
      最近更新 更多