寻找算法以在位集中找到 n 个未设置位答案

【问题标题】：Looking for algorithm to find n unset bits in a bitset寻找算法以在位集中找到 n 个未设置位
【发布时间】：2014-11-25 15:55:52
【问题描述】：

背景：我正在编写一个分配 4KiB 块的物理内存分配器。它使用一个位集来标记哪些 4KiB 内存块已被使用。我没有可用的标准 C 库。

问题：我正在寻找一种算法，它会在最小的间隙中找到 n 个连续的未设置位，以便我可以留下最大的未设置位间隙。

示例：假设一个位集包含以下位：

 0010 0000 0111 0001 1100 0011

如果我想设置 4 位，算法应该返回 18 位。

【问题讨论】：

顺便说一句，你的目标是在释放内存时解决合并问题吗？
请添加一个例子（最好是有插图的）。
准确地说，如果你只是想寻找最合适的连续位，最简单的方法是扫描所有位，性能取决于实现细节。但是，如果您有将合并问题分解为多个部分的额外要求，则可能需要采取其他策略而不是您要采用的策略。
看起来您正在尝试处理内存碎片，内存是否被释放？ HuStmpHrrr 和我对它的这方面感兴趣的原因是，如果不使用解除分配，您可以获得一个相当简单的实现，但如果您必须重新分配内存，那就更棘手了。
其中一些bits hacks 可能会有所帮助

标签： c algorithm

【解决方案1】：

我认为你可以在 2 遍中做到这一点：

通过#1：
扫描数组，注意连续零位的数量，以及这些位的开始。

从您的示例中，扫描将产生：

2 bits, starting at 0
6 bits, starting at 3
3 bits, starting at 12
4 bits, starting at 18

通过 #2：
扫描第 1 遍的数据，寻找目标值 (4)，或大于目标的最小值。

用 C 语言编写这两个过程似乎微不足道，这应该是适用于所有情况的通用解决方案。

在您完成这项工作后，我还看到了一些优化，因此在某些情况下您可能根本不需要运行 Pass #2。

【讨论】：

正如你所说，你不需要第二遍：如果在第一遍中你找到一个正好有 n 位的块，你可以停在那里并立即使用它，如果你不这样做，您可以跟踪到目前为止看到的最小块的位置，并在第一遍完成后选择它。
我发现这个答案对编写我自己的实现最有帮助。回家后我会在问题中发布我的代码。

【解决方案2】：

正如我在评论中提到的，当您尝试处理合并时，情况会完全不同。但是要解决您现在提出的问题。使用Red Black Tree 非常简单，我习惯将其称为RBTree。

有成千上万的RBTree 实现，因此您可以选择一个适合您的语言。我只使用类似 python 的伪代码提供我的方式到allocate 内存。（就像我说的，如果你在释放内存时试图弄清楚，这是一个不同的问题。）

RB树：

key：0 的个数。

value：连续0的第一个位置。

所以在你的情况下，你的问题应该被初始化为：

rbt=new RBTree()
rbt.insert(2, 0)
rbt.insert(6, 3)
rbt.insert(3, 12)
rbt.insert(4, 18)

如果我数错了，请原谅我。

当你想分配一块内存时：

func alloc(num_of_chunks):
    # try to find the key-value-pair that is the min one that satisfy: chunk.key >= num_of_chunks
    chunk=rbt.find_ceil(num_of_chunks) 
    if chunk is Nil: raise NotFound
    ret=chunk.value
    # may locate some chunks that have bigger size than required.
    if chunk.key>num_of_chunks:
        rbt.insert(chunk.key-num_of_chunks, chunk.value+num_of_chunks)
    return ret

所以要维护树。

使用RBTree的优势：

速度很快。我在评论中提出的线性搜索是O(n)，但是使用RBTree，它会缩小到O(lg n)，这意味着它更具可扩展性。
易于维护。有数千个实施良好的库可以满足您的不同需求。

更新

这个答案似乎涉及动态内存分配，可能会引入母鸡问题，实际上并非如此。

如果你知道一个块没有被分配，它必须是未使用的。因此，RBTree 的数据可以存储在未使用的块中，这意味着元数据实际上分布在内存空间中。因此在 C 中，此类问题中的节点可能是：

struct node {
    int length; // key
    struct node *left, *right;
}

在块的第一个字节中。

所以你要做的就是记住root。

struct node *root;
// your code should operate on rootp, since rotation on RBTree may have root changed.
// all interfaces related should all receive struct node ** type.
struct node **rootp = &root;

“那么值呢？你没有定义一个字段来存储块的地址。”

是的，我已经定义了。由于数据本身存储在chunk中，所以struct node的地址就是chunk的地址。

所以通过这种方式，您可以避免动态内存分配，而且，我似乎没有回答您如何找到合适的位序列......但我认为通过这种方式，您可以管理您的内存分配更好。

【讨论】：

我喜欢这个解决方案。遗憾的是它不适合 - 构建 RBTree 需要分配内存，这是我想要实现的目标。
@NFSGamer 我已经更新了我的答案，解释了如何避免分配内存。
我将保留此答案以备将来参考，因为我仍想实现位集。不过下次写内存分配器的时候，一定要先试试这个方法！

【解决方案3】：

int find_free(int alloca_mem, int req)
{
    int min_len = INT_MAX;
    int start = 0;
    int min_len_index = -1;
    int i;
    for (i = 0; i < NUM_BITS; ++i)
        if (alloca_mem & (1 << i)) {
            int len = start - i;
            if (len > req && len < min_len) {
                min_len_index = i;
                min_len = len;
            }
            start = i + 1;
        }
    }
    return men_len_index;
 }

【讨论】：

alloca_mem | (1 << i) 始终为真。我想你的意思是alloca_mem & (1 << i)。

【解决方案4】：

调用一个机器字的大小为 w（通常 w = 32 或 64），以及 4KB 内存页的总数 m。如果已经分配的块之间有 g 个间隙，并且有一些长度为 n 的空闲块完全适合一个机器字（并且您很高兴选择一个最小的这样的块，而不是跨越两个字之间边界的块），然后利用整数加法“波纹”的方式在O(m/w + g)时间内找到：

给定一个代表 w 4KB 页面的位图字，反转它的所有位。现在任何由 k 个连续空闲 4KB 页组成的块都是 k 个连续 1 位的块。
如果添加一个只有 1 位是该块的 LSB 的整数，则进位将一直向左波动，将该块中的所有 k 位设置为 0，下一位设置为 1。

仅使用它，不使用乘法、除法或位扫描等昂贵或“奇怪”的指令，在 O(m/w + g) 时间内找到包含 n 个未设置位的间隙并不难。但是为了找到最小这样的块，您似乎需要位扫描或除法。幸运的是，我们可以将需要使用这些指令的次数限制为在整个位图中总共最多 w-1 次操作，因为这是当前最佳间隙可以“缩小”到更小的最大次数. （您可以使用类似于上一段的代码，使用更简单的指令在 O(log w) 时间内找到 MSB 的索引，而不是使用除法；这可能会更快，但重点是没有意义的，因为需要的除法很少.) 我们仍然需要 O(g) 乘法，这可能并不理想，但这些在现代 CPU 上很快。

const int m = /* HOW MANY BLOCKS */;
const int w = sizeof (unsigned) * CHAR_BIT;
unsigned bits[m / w];

unsigned findSmallestBlock(unsigned n) {
    if (n > w) return UINT_MAX;    // Can only find within-word blocks.
    unsigned bestI = UINT_MAX;    // Index with bits[]
    unsigned bestLsb = 0;  // Has a 1-bit in the LSB of the gap; 0 = "no solution"
    unsigned bestShifted = ~0;    // The gap's bits, "flush right".
    for (int i = 0; i < sizeof bits / sizeof bits[0]; ++i) {
        // Find the shortest block in bits[i], if one exists
        // First, handle an edge case
        if (bestLsb == 0 && bits[i] == 0) {
            // We don't handle this edge case
            bestI = i;
            bestLsb = 1;
            bestShifted = ~0;
            if (n == w) break;    // Perfectly snug
        }
        unsigned y = ~bits[i];
        unsigned probe = y & (y - 1);    // The LSB of the gap we will test
        while (probe << (n - 1)) {    // Left-shifting it too far => 0 => stop.
            y += probe;    // Ripple!
            unsigned edge = y & (y - 1);    // Extract new LSB.  Overflow to 0 OK.
            unsigned gap = edge - probe;    // Every bit in the gap is 1.
            // Is the gap big enough?
            if (probe << (n - 1) <= gap) {
                // The gap is big enough.  Is it the tightest fit so far?
                // Doing this with bit-scan operations is easy; without them we can
                // use integer multiplication and division, but we want to keep the
                // divisions to a minimum.  Needs an edge case to be handled above.
                if (gap < bestShifted * probe) {
                    // Found a new, shorter gap.
                    // The division shifts the entire block right so that it starts
                    // at bit 0.  It's expensive, but happens at most w-1 times.
                    bestI = i;
                    bestLsb = probe;
                    bestShifted = gap / probe;
                    // Is it perfectly snug?  If so, we can stop now.
                    if (probe << (n - 1) > gap >> 1) goto done;    // Yes, "goto".
                }
            }
            y -= edge;            // Clear the *new* LSB; ignore intervening bits
            probe = edge << 1;    // Again, skip over all intervening bits
        }
    }

done:
    if (bestLsb == 0) return UINT_MAX;    // Didn't find anything

    // Find the index of the 1-bit in bestLsb in O(log w) time.
    unsigned pos = bestI * w;
    unsigned z = w >> 1;    // The number of bits we will try to shift right
    while (bestLsb > 1) {
        unsigned x = bestLsb >> z;
        if (x) {
            bestLsb = x;
            pos += z;
        }
        z >>= 1;
    }

    return pos;
}

关于时间复杂度的一句话：O(m/w + g) 与其他解决方案提出的 O(m) 时间相同，当 g 与 m 大小相同时，可能会发生这种情况：例如如果每秒分配 4KB 页面，则 g = m/2。但是如果 g 很小——无论是因为没有多少页被分配，或者因为几乎所有页都已经分配，或者因为它们只是以导致很少出现间隙的模式被分配和释放——那么时间复杂度就会很大更接近 O(m/w)。

还有一件事：在不改变时间复杂度的情况下，此代码可以通过跟踪每个单词的最后间隙并在每个单词的开头进行一次性测试来调整以查找跨越单词边界的块看看增加前一个单词的最后间隙是否会起作用。查找大于 w 的块也是可能的（除此之外：任何满足 n > kw 页面请求的间隙必须在 bits[] 中包含至少 k 个全字 0 值 - 使用这可能会在搜索时大大降低常数因子对于较大的间隙）。这两个扩展都需要更多的簿记。

【讨论】：