如何将数组索引（整数）存储为 B+树中的键？答案

【问题标题】：How to store array index (an integer) as keys in a B+tree?如何将数组索引（整数）存储为 B+树中的键？
【发布时间】：2021-01-07 06:28:55
【问题描述】：

我查看了 GitHub 上 JavaScript 中 B+tree 的每个示例，并尝试了 this 中的 simplifying one down to semi-readable code。但是我还是不明白每个内部节点的keys数组的结构是什么。钥匙长什么样？您如何在 get/insert/remove 算法中使用它们？特别是对于这个问题，我想将 B+tree 视为外部数组或排序列表。所以我希望“键”是一个整数（数组中项目的索引）。我该怎么做呢？什么是 JSON 演示示例，展示了在这种情况下简单 B+树的外观？

{
  type: 'tree',
  keys: [?],
  children: [
    {
      type: 'internal',
      keys: [?],
      children: [
        {
          type: 'leaf',
          value: { foo: '123' }
        },
        {
          type: 'leaf',
          value: { foo: '234' }
        }
      ]
    },
    {
      type: 'internal',
      keys: [?],
      children: [
        {
          type: 'leaf',
          value: { foo: '345' }
        },
        {
          type: 'leaf',
          value: { foo: '456' }
        }
      ]
    }
  ]
}

键的作用是什么？我知道它们用于查找，不知何故，但是如何？

假设基础有 32 个内部节点，每个内部节点都有 32 个内部节点，每个内部节点都有一堆叶子。内部节点的key是什么？

我想在 JavaScript 中实现一个健壮的 B+树，目前很难理解 B+树的基础知识。

【问题讨论】：

标签： algorithm data-structures b-tree b-plus-tree

【解决方案1】：

所以我希望“键”是一个整数（数组中项目的索引）。我该怎么做？

不行，你不能用整个结构中item的绝对索引作为key。这意味着当在数组的前面插入/删除时，整个树中的所有节点都需要更新它们的索引。

相反，您需要存储子树的大小，以便在遍历树时将它们累积到相对索引中 - 您已经在 How to return the tree node by index when tree nodes have subtree size? 中完成了此操作。除非节点本身（或其子节点之一）发生变化，否则这些大小永远不会改变，因此您将始终只需要更新 O(log n) 节点。

什么是 JSON 演示示例，展示了在这种情况下简单 B+树的外观？

{ type: 'internal',
  // size: 8,
  // childSizes: [2, 3, 3],
  keys: [2, 5],
  children: [
    { type: 'leaf',
      // size: 2
      // childSizes: [1, 1]
      keys: [1],
      values: [ {…}, {…} ]
    },
    { type: 'leaf',
      // size: 3,
      // childSizes: [1, 1, 1],
      keys: [1, 2],
      values: [ {…}, {…}, {…} ]
    },
    { type: 'internal',
      // size: 3
      // childSizes: [1, 2]
      keys: [1],
      chilren: [
        { type: 'leaf',
          // size: 1
          // childSizes: [1]
          keys: [],
          values: [ {…} ]
        },
        { type: 'leaf',
          // size: 2
          // childSizes: [1, 1]
          keys: [1],
          values: [ {…}, {…} ]
        },
      ]
    },
  ]
}

如果每个节点在一个字段中只有它的size 就足够了，但这需要将节点的所有子节点加载到内存中，仅用于累积大小以在查找/插入/中找到要选择的子节点/删除操作，所以通常不做。您可以将节点大小存储在它们的父节点中（如childSizes）。或者您可能已经将累积的大小存储在 B+ 树的 keys 数组中，这样您就不需要在搜索期间计算总和（但是如果只有一个条目发生更改，则必须更新整个数组 - 这是一个权衡）。与只存储 k 子级之间的 k-1“边界”键的经典 B+ 树不同，将完整的总和（= 节点的大小）存储在最后一个数组索引中可能是个好主意。

【讨论】：