如何在 Cloud Firestore 中获取集合的所有文档及其子集合效率答案

【问题标题】：How to fetch all the documents of a collection and it's sub collection efficiently in Cloud Firestore如何在 Cloud Firestore 中获取集合的所有文档及其子集合效率
【发布时间】：2020-04-16 01:46:36
【问题描述】：

TL;DR 获取少量文档需要很长时间

场景：
我为每个帐户都有一个集合，每个帐户都包含一个projects 子集合和tasks 子集合。 tasks 子集合中的每个文档还可以包含checkLists 子集合中的清单

注意：

项目可以包含任务，而任务又可以包含清单。
任务可以独立创建，即；它不一定是项目的一部分。
项目和任务都是顶级子集合，checkLists 子集合嵌套在每个任务中。

插图：

someTopLevelDB
   |
   |____ accountId1
   |         |______projects
   |         |         |_______ projectId1
   |         |
   |         |______tasks
   |                  |________taskId1 (belongs to projectId1)
   |                  |           |
   |                  |           |________checkLists
   |                  |                         |
   |                  |                         |_____checkListId1
   |                  |
   |                  |________taskId2 (standalone)

用例：当用户单击重复项目（从 UI）时，我必须创建整个项目的副本，即；所有任务、清单等。

代码：这样做的过程很慢，当我分析代码时，这个 sn-p 需要很长时间才能执行。 sn-p 获取所有任务及其清单

let db = admin.firestore();

function getTasks(accountId) {
    return db.collection('someTopLevelDB')
        .doc(accountId)
        .collection('tasks')
        .where('deleted', '==', false)
        .get();
}


function getCheckLists(accountId, taskId) {
    return db.collection('someTopLevelDB')
        .doc(accountId)
        .collection('tasks')
        .doc(taskId)
        .collection('checkLists')
        .where('deleted', '==', false)
        .get();
}


async function getTasksAndCheckLists(accountId) {
    try {
        let records = { tasks: [], checkLists: [] };

        // prepare tasks details
        const tasks = await getTasks(accountId);
        const tasksQueryDocumentSnapshot = tasks.docs;
        for (let taskDocumentSnapshot of tasksQueryDocumentSnapshot) {
            const taskId = taskDocumentSnapshot.id;
            const taskData = taskDocumentSnapshot.data();
            const taskDetails = {
                id: taskId,
                ...taskData
            };
            records.tasks.push(taskDetails);

            // prepare check list details
            checkListQueryDocumentSnapshot = (await getCheckLists(accountId, taskId)).docs;
            for (let checkListDocumentSnapshot of checkListQueryDocumentSnapshot) {
                const checkListId = checkListDocumentSnapshot.id;
                const checkListData = checkListDocumentSnapshot.data();
                const checkListDetails = {
                    id: checkListId,
                    ...checkListData
                };
                records.checkLists.push(checkListDetails);
            }
        }
        console.log(`successfully fetched ${records.tasks.length} tasks and ${records.checkLists.length} checklists`);
        return records;
    } catch (error) {
        console.log('Error fetching docs ====>', error);
    }
}




// Call the function to fetch records
getTasksAndCheckLists('someAccountId')
    .then(result => {
        console.log(result);
        return true;
    })
    .catch(error => {
        console.error('Error fetching docs ===>', error);
        return false;
    });

执行统计：
在 220.532 秒内成功获取 627 个任务和 51 个清单

我得出的结论是，检索清单会减慢整个过程，因为检索任务相当快。

所以我的问题如下：

有什么办法可以优化上面的文档检索代码？
有什么办法可以找回子的文件通过重构数据和使用 collectionGroup 查询更快地收集等等？

谢谢。

【问题讨论】：

减少嵌套集合不是更好吗，阅读this
您共享的文档是针对实时数据库的，而不是针对云 Firestore 的。两者都不同，因为与 RTDB 不同，firestore 查询很浅。将清单存储为子集合允许我按需延迟加载它，并为我提供分页和排序的灵活性，而不是将其存储为任务中的一个大数组。我没有像项目和任务一样将清单存储为顶级集合的原因是因为项目和任务是独立的实体，可以独立存在，但清单并非如此。
@vipul 我想我的数据插图可能看起来像 RTDB。对此我深表歉意。
我很抱歉，我把 Firestore 误认为是 RTDB，是的，Firestore 赋予了存储嵌套数据的灵活性。
只是想添加这个，有 db.collectionGroup 功能可能会帮助你，如果你有时间你可以检查这个out

标签： javascript firebase google-cloud-firestore

【解决方案1】：

问题是由于在 for 循环中使用 await 引起的：

checkListQueryDocumentSnapshot = (await getCheckLists(accountId, taskId)).docs;

这会导致您的 for 循环在获取该特定任务的检查列表时停止。

避免这种情况的方法是使用 Promise 链异步处理检查列表。当您遍历任务时，您为该任务的检查列表创建请求，为其结果添加一个侦听器，然后发送它并立即移动到下一个任务。

使用您的数据结构，检查列表与其在服务器上的特定任务相关，但在您上面的代码中它们与它们无关。当使用相同的数据结构异步工作时，如果您只是使用带有push() 的标准数组（例如，任务 B 的清单获取可能在任务 A 之前完成），那么它们将与您的任务无序。为了解决这个问题，在下面的代码中，我将清单嵌套在 taskDetails 对象下，因此它们仍然是链接的。

async function getTasksAndCheckLists(accountId) {
    try {
        let taskDetailsArray = [];

        // fetch task details
        const tasks = await getTasks(accountId);

        // init Promise holder
        const getCheckListsPromises = [];

        tasks.forEach((taskDocumentSnapshot) => {
            const taskId = taskDocumentSnapshot.id;
            const taskData = taskDocumentSnapshot.data();
            const taskDetails = {
                id: taskId,
                checkLists: [], // for storing this task's checklists
                ...taskData
            };
            taskDetailsArray.push(taskDetails);

            // asynchronously get check lists for this task
            let getCheckListPromise = getCheckLists(accountId, taskId)
                .then((checkListQuerySnapshot) => {
                    checkListQuerySnapshot.forEach((checkListDocumentSnapshot) => {
                        const checkListId = checkListDocumentSnapshot.id;
                        const checkListData = checkListDocumentSnapshot.data();
                        const checkListDetails = {
                            id: checkListId,
                            ...checkListData
                        };

                        taskDetails.checkLists.push(checkListDetails);
                    });
                });

            // add this task to the promise holder
            getCheckListsPromises.push(getCheckListPromise);
        });

        // wait for all check list fetches - this is an all-or-nothing operation
        await Promise.all(getCheckListsPromises);

        // calculate the checklist count for all tasks
        let checkListsCount = taskDetailsArray.reduce((acc, v) => acc+v.checkLists.length, 0);

        console.log(`successfully fetched ${taskDetailsArray.length} tasks and ${checkListsCount} checklists`);
        return taskDetailsArray;
    } catch (error) {
        console.log('Error fetching docs ====>', error);
    }
}

通过这些更改，您应该会看到函数运行的持续时间大大缩短。根据您提供的时间，我猜它会下降到大约 2-3 秒。

【讨论】：

另外，请注意我如何将每次使用 let docs = querySnapshot.docs; for (let doc in docs) { ... } 替换为使用 querySnapshot.forEach((doc) => { ... })。这是一个很小的变化，但可以节省对结果的双重循环。
谢谢.. 将测试它并返回。
是的，这确实将执行时间更改为 4 秒。尽管在减少数组以获取清单计数时，您的代码中有一个小错字。这是v.checkLists 而不是v.checklists。除此之外，您的解决方案有效。请更正以上内容，以便我可以将此标记为已接受的答案
山姆抓得很好，解释得很好！
@FrankvanPuffelen 确实是很好的解释！