为我们的内容应用优化 Firestore 查询答案

【问题标题】：Optimizing Firestore queries for our content app为我们的内容应用优化 Firestore 查询
【发布时间】：2021-04-12 01:19:25
【问题描述】：

我们正在使用 Firestore 构建内容应用。基本要求是有一个主集合，比方说“内容”。文档数量可能会达到 1000 个。

content1, content2, content3 ... content9999

我们希望为我们的用户提供此集合中的内容，确保他们不会两次看到相同的内容，并且每次他们在应用程序中都有新内容。同时，我们不希望将相同的内容序列提供给每个用户。一些随机化会很好。

user1: content9, content123, content17, content33, content902 .. and so on
user2: content854, content79, content190, content567 ... and so on

我一直在思考如何在不复制主集合的情况下实现此解决方案。复制主集合会非常昂贵，但可以完成这项工作。

此外，我们如何才能编写经济高效且性能优化的查询，尤其是当我们希望在这些内容片段的序列中保持随机性时？

【问题讨论】：

标签： database firebase google-cloud-firestore database-design denormalization

【解决方案1】：

这是我的建议。请把它当作伪代码，因为我没有运行它。

如果内容文档 ID 不可见

您必须存储和维护哪个用户看过哪些内容，例如在集合中：/seen/uid_contentId

请参阅here 一种从集合中获取随机文档的巧妙方法。您需要存储集合的大小，可能作为另一个集合中的文档。所以你可以这样做：

const snapshot = await firestore.doc(`/userSeen/${uid}`).get(); // do it only once
const alreadySeen = snapshot.exists ? snapshot.data.contents : [];

async function getContent(uid) {
  for (let trials = 0; trials < 10; trials++) { // limit the cost
    const startAt = Math.random() * contentCollectionSize;
    const snapshot = await firestore.collection("/contents").startAt(startAt).limit(1).get();
    const document = snapshot.empty ? null : snapshot.docs[0]; // a random content

    if(document.exists && !alreadySeen.includes(document.id)) {
      alreadySeen.push(document.id);
      await firestore.doc(`/userSeen/${uid}`).set({contents: arrayUnion(document.id)}); // mark it as seen
      return document;
    }
  }

  return null;
}

您可能需要对 Firestore 进行多次查询（上限为 10 以限制成本），因为您无法在客户端计算内容文档 ID。

如果内容文档 id 遵循简单的模式：1、2、3、...

为了节省成本和提高性能，您应该将每个用户看到的所有内容存储在一个文档中（限制为 1MB，即超过 250,000 个整数！）。然后，您为每个用户下载此文档一次，并在客户端检查是否已看到随机内容。

const snapshot = await firestore.doc(`/userSeen/${uid}`).get(); // do it only once
const alreadySeen = snapshot.exists ? snapshot.data.contents : [];


async function getContent(uid) {
  let idx = Math.random() * contentCollectionSize;

  for (let trials = 0; trials < contentCollectionSize; trials++) { 
    idx = idx + 1 < contentCollectionSize ? idx + 1 : 0;

    if(alreadySeen.includes(idx)) continue; // this shortcut reduces the number of Firestore queries

    const document = await firestore.doc(`/contents/${idx}`).get();

    if(document.exists){
      alreadySeen.push(idx);
      await firestore.doc(`/userSeen/${uid}`).set({contents: arrayUnion(idx)}); // mark it as seen
      return document;
    }
  }

  return null;
}

如您所见，如果您对内容使用可见的文档 ID，这会便宜得多。但也许有人会有更好的主意。

【讨论】：

【解决方案2】：

我有另一个想法。你可以生成内容的标量：D

创建另一个集合 - 标量
添加字段类型数组
编写一个函数，该函数将遍历内容集合，并随机生成内容项集或考虑其他属性，如受欢迎程度、人口统计、用户行为。
在标量集合中生成 1000 组内容项，例如每月执行一次。
您甚至可以衡量每个标量在吸引回头用户和推广更具吸引力的用户方面的效果。
拥有包含集合项集的标量集合后，您可以将用户分配给一个标量。并相应地呈现内容项。

【讨论】：