Promise 不会等待函数 promise 被解决答案

【问题标题】：Promise doesn't wait for functions promise to be resolvedPromise 不会等待函数 promise 被解决
【发布时间】：2021-02-18 17:17:02
【问题描述】：

所以我一直在做一个爬虫项目。

现在我已经实现了很多东西，但我一直坚持这一点。

所以首先让我解释一下工作流程：在 scraping-service 模块中调用 Scraper，在那里我等待调用的函数的承诺得到解决。数据在抓取工具中获取，并传递给 data_functions 对象，其中数据：合并、验证并插入到数据库中。

现在是代码：

scraping-service

const olxScraper = require('./scrapers/olx-scraper');
const santScraper = require('./scrapers/sant-scraper');
//Calling scraper from where we want to get data about apartments
const data_functions = require('./data-functions/dataF');

let count = 1;

Promise.all([
  olxScraper.olxScraper(count),
  santScraper.santScraper(count),
]).then(() => data_functions.validateData(data_functions.mergedApartments));

所以我在这里等待这两个函数的promise，然后将合并后的数据传递给data_functions中的validateData方法。

这是刮板：

const axios = require('axios'); //npm package - promise based http client
const cheerio = require('cheerio'); //npm package - used for web-scraping in server-side implementations
const data_functions = require('../data-functions/dataF');

//olxScraper function which as paramater needs count which is sent in the scraping-service file.
exports.olxScraper = async (count) => {
  const url = `https://www.olx.ba/pretraga?vrsta=samoprodaja&kategorija=23&sort_order=desc&kanton=9&sacijenom=sacijenom&stranica=${count}`;
  //url where data is located at.
  const olxScrapedData = [];
  try {
    await load_url(url, olxScrapedData); //pasing the url and empty array
  } catch (error) {
    console.log(error);
  }
};

//Function that does loading URL part of the scraper, and starting of process for fetching raw data.
const load_url = async (url, olxScrapedData) => {
  await axios.get(url).then((response) => {
    const $ = cheerio.load(response.data);
    fetch_raw_html($).each((index, element) => {
      process_single_article($, index, element, olxScrapedData);
    });

    process_fetching_squaremeters(olxScrapedData); // if i place 
 //data_functions.mergeData(olxScrapedData); here it will work
  });
};

//Part where raw html data is fetched but in div that we want.
const fetch_raw_html = ($) => {
  return $('div[id="rezultatipretrage"] > div')
    .not('div[class="listitem artikal obicniArtikal  i index"]')
    .not('div[class="obicniArtikal"]');
};

//Here is all logic for getting data that we want, from the raw html.
const process_single_article = ($, index, element, olxScrapedData) => {
  $('span[class="prekrizenacijena"]').remove();
  const getLink = $(element).find('div[class="naslov"] > a').attr('href');
  const getDescription = $(element).find('div[class="naslov"] > a > p').text();
  const getPrice = $(element)
    .find('div[class="datum"] > span')
    .text()
    .replace(/\.| ?KM$/g, '')
    .replace(' ', '');
  const getPicture = $(element).find('div[class="slika"] > img').attr('src');
  //making array of objects with data that is scraped.
  olxScrapedData[index] = {
    id: getLink.substring(27, 35),
    link: getLink,
    description: getDescription,
    price: parseFloat(getPrice),
    picture: getPicture,
  };
};

//Square meters are needed to be fetched for every single article.
//This function loads up all links in the olxScrapedData array, and updating objects with square meters value for each apartment.
const process_fetching_squaremeters = (olxScrapedData) => {
  const fetchSquaremeters = Promise.all(
    olxScrapedData.map((item) => {
      return axios.get(item.link).then((response) => {
        const $ = cheerio.load(response.data);
        const getSquaremeters = $('div[class="df2  "]')
          .first()
          .text()
          .replace('m2', '')
          .replace(',', '.')
          .split('-')[0];
        item.squaremeters = Math.round(getSquaremeters);
        item.pricepersquaremeter = Math.round(
          parseFloat(item.price) / parseFloat(getSquaremeters)
        );
      });
    })
  );

  fetchSquaremeters.then(() => {
    data_functions.mergeData(olxScrapedData); //Sending final array to mergeData function.
    return olxScrapedData;
  });
};

现在，如果我在fetchSquaremeters.then 中使用console.log(olxScrapedData)，它会输出刮掉的公寓，但它不想调用函数data_functions.mergeData(olxScrapedData)。但是如果我在load_url中添加那个块，它会触发函数和数据被合并，但是没有平方米的东西，我真的需要那个数据。

所以我的问题是，如何做到这一点？我需要在其他地方调用函数吗？

我想要的只是将最后一个 olxScrapedData 发送到这个函数 mergeData 以便将来自不同刮板的数组合并为一个。

谢谢！

编辑：这也是其他刮板的外观：https://jsfiddle.net/oh03mp8t/。请注意，在这个刮板中没有任何承诺。

【问题讨论】：

标签： javascript node.js promise

【解决方案1】：

尝试添加：const process_fetching_squaremeters = async (olxScrapedData) ... 然后await fetchSquaremeters.then(..)。

詹姆斯，在回答之前告诉你发生了什么。您必须等待此承诺得到解决，才能正确执行。如果你没有 async/await、promise 方面的经验，我建议你看一些关于它们的课程，以真正了解这里发生了什么

【讨论】：

其实这种情况下，可以将await关键字替换成return语句，会返回promise，最终会被await load_url()捕获

【解决方案2】：

你是否在你的 promise/async 语句中遗漏了 return/await 语句，尤其是当你的最后一个语句也是一个 promise 时？

否则，您可能只是要求稍后执行该承诺，而不是返回结果并让 $.all() 等待它。

【讨论】：

我对异步/承诺相当陌生，所以我现在正在学习这些新东西。那么，我在代码中的某处缺少异步？你是在考虑爬虫的代码还是其他地方？
在某些方面，我认为旧的 Promise 语法实际上更容易理解，特别是如果您不熟悉 Promise（Bluebird 库让事情变得更好）。 Async/await 语法在底层做着完全相同的事情，但它是语法糖，使它读起来像同步代码。我建议使用两组语法编写一些单元测试或沙盒代码，并使用 console.log() 语句来了解不同边缘情况下事件的顺序。
Javascript 是一种单线程语言。异步承诺的含义是将函数添加到调用堆栈（事件循环）的末尾，并在当前函数完成或中断后运行它。调用异步函数会返回一个 Promise 对象。 Promise 是可链接的。如果您在异步函数中间调用异步函数，请养成明确返回您的承诺的习惯。这意味着您可以在链的顶部调用 await ，它会等待所有嵌套的回调完成。