计算其二进制表示恰好具有所需数字 1 的数字答案

【问题标题】：Calculating the numbers whose binary representation has exactly required number 1's计算其二进制表示恰好具有所需数字 1 的数字
【发布时间】：2018-02-17 04:36:12
【问题描述】：

好的，所以问题是找到一个正整数 n，使得在 n+1 到 2n（包括两者）中恰好有 m 个数字，其二进制表示正好有 k 个 1。约束：m

现在我想不出一种有效的方法来解决这个问题，而不是遍历每个整数并在每个整数所需的间隔内计算二进制 1 计数，但这会花费太长时间。那么还有其他方法可以解决这个问题吗？

【问题讨论】：

请展示你的努力（代码）。
也许这有帮助：stackoverflow.com/questions/109023/…
@MemAllox：这不是那种可以用粗略的代码进行第一次尝试然后对其进行改进的问题。这类问题需要对算法进行分析和思考。不用代码问这种问题是完全合适的。
您是否在纸、纸或编辑器上写了一些这样的小数字来研究这种模式？例如，对于 m=10 和 k = 5。更好的 m=7 和 k=3 :)

标签： algorithm binary dynamic-programming

【解决方案1】：

你怀疑有更有效的方法是正确的。

让我们从一个稍微简单的子问题开始。缺了一些真的很聪明洞察力，我们将需要能够找到整数的个数 [n+1, 2n] 在其二进制表示中完全设置了 k 位。至保持简短，让我们将此类整数称为“weight-k”整数（有关此术语的动机，请查阅 Hamming weight）。我们可以立即简化我们的计数问题：如果我们可以计算[0, 2n] 中的所有重量-k 整数我们可以计算[0, n]中所有的重量-k整数，我们可以减去一个计数从另一个获取权重的数量-[n+1, 2n] 中的k 整数。

所以一个明显的子问题是计算有多少权重-k 整数在区间[0, n]中，对于给定的非负整数k和n。

解决此类问题的标准技术是寻找解决问题的方法将其分解为较小的同类子问题；这是一方面通常称为dynamic programming。在这种情况下，有一个简单的方法这样做：考虑[0, n] 中的偶数和[0, n] 中的奇数分别地。 [0, n] 中的每个偶数 m 的权重与 m/2（因为除以二，我们所做的就是删除一个零少量）。同样，每个奇数m 的权重正好比 (m-1)/2 的重量。考虑到适当的基本情况，这导致以下递归算法（在这种情况下用 Python 实现，但它应该很容易翻译成任何其他主流语言）。

def count_weights(n, k):
    """
    Return number of weight-k integers in [0, n] (for n >= 0, k >= 0)
    """
    if k == 0:
        return 1  # 0 is the only weight-0 value
    elif n == 0:
        return 0  # only considering 0, which doesn't have positive weight
    else:
        from_even = count_weights(n//2, k)
        from_odd = count_weights((n-1)//2, k-1)
        return from_even + from_odd

这里有很多错误的空间，所以让我们测试一下我们花哨的递归算法针对效率较低但更直接的事物（并且，我希望，更多显然正确）：

def weight(n):
    """
    Number of 1 bits in the binary representation of n (for n >= 0).
    """
    return bin(n).count('1')

def count_weights_slow(n, k):
    """
    Return number of weight-k integers in [0, n] (for n >= 0, k >= 0)
    """
    return sum(weight(m) == k for m in range(n+1))

比较两种算法的结果看起来很有说服力：

>>> count_weights(100, 5)
11
>>> count_weights_slow(100, 5)
11
>>> all(count_weights(n, k) == count_weights_slow(n, k)
...     for n in range(1000) for k in range(10))
True

然而，我们所谓的快速 count_weights 函数并不能很好地扩展到您需要的尺码：

>>> count_weights(2**64, 5)  # takes a few seconds on my machine
7624512
>>> count_weights(2**64, 6)  # minutes ...
74974368
>>> count_weights(2**64, 10)  # gave up waiting ...

但这里是动态编程的第二个关键思想出现的地方：memoize！也就是记录之前调用的结果，以防我们需要使用他们再次。事实证明，递归调用链将倾向于重复很多电话，所以记忆是有价值的。在 Python 中，这是通过 functools.lru_cache 装饰器很容易做到。这是我们的新 count_weights 的版本。唯一改变的是顶部的额外行：

@lru_cache(maxsize=None)
def count_weights(n, k):
    """
    Return number of weight-k integers in [0, n] (for n >= 0, k >= 0)
    """
    if k == 0:
        return 1  # 0 is the only weight-0 value
    elif n == 0:
        return 0  # only considering 0, which doesn't have positive weight
    else:
        from_even = count_weights(n//2, k)
        from_odd = count_weights((n-1)//2, k-1)
        return from_even + from_odd

现在再次对那些更大的示例进行测试，我们得到结果的速度要快得多，没有任何明显的延迟。

>>> count_weights(2**64, 10)
151473214816
>>> count_weights(2**64, 32)
1832624140942590534
>>> count_weights(5853459801720308837, 27)
356506415596813420

所以现在我们有了一种有效的计数方法，我们有一个逆问题解决：给定k 和m，找到一个n 使得count_weights(2*n, k) - count_weights(n, k) == m。事实证明这一点特别容易，因为数量count_weights(2*n, k) - count_weights(n, k) 是单调的随着n 增加（对于固定k），更具体地说，增加 0 或 1 每次 n 增加 1。我会留下那些证据事实告诉你，但这里有一个演示：

>>> for n in range(10, 30): print(n, count_weights(n, 3))
... 
10 1
11 2
12 2
13 3
14 4
15 4
16 4
17 4
18 4
19 5
20 5
21 6
22 7
23 7
24 7
25 8
26 9
27 9
28 10
29 10

这意味着我们保证能够找到解决方案。可能有多种解决方案，因此我们的目标是找到最小的解决方案（尽管找到最大的解决方案同样容易）。二分搜索为我们提供了一种粗略但有效的方法来做到这一点。代码如下：

def solve(m, k):
    """
    Find the smallest n >= 0 such that [n+1, 2n] contains exactly
    m weight-k integers.

    Assumes that m >= 1 (for m = 0, the answer is trivially n = 0).
    """
    def big_enough(n):
        """
        Target function for our bisection search solver.
        """
        diff = count_weights(2*n, k) - count_weights(n, k)
        return diff >= m

    low = 0
    assert not big_enough(low)

    # Initial phase: expand interval to identify an upper bound.
    high = 1
    while not big_enough(high):
        high *= 2

    # Bisection phase.
    # Loop invariant: big_enough(high) is True and big_enough(low) is False
    while high - low > 1:
        mid = (high + low) // 2
        if big_enough(mid):
            high = mid
        else:
            low = mid
    return high

测试解决方案：

>>> n = solve(5853459801720308837, 27)
>>> n
407324170440003813446

让我们再次检查n：

>>> count_weights(2*n, 27) - count_weights(n, 27)
5853459801720308837

看起来不错。如果我们的搜索正确，这应该是最小的 n 有效：

>>> count_weights(2*(n-1), 27) - count_weights(n-1, 27)
5853459801720308836

在上面的代码，以及解决问题的其他方法，但我希望这能给你一个起点。

OP 评论说他们需要在 C 中执行此操作，如果不使用外部库，则无法立即使用记忆。这是不需要记忆的count_weights 的变体。它是通过 (a) 调整 count_weights 中的递归来实现的，以便在两个递归调用中使用相同的 n，然后 (b) 对于给定的 n，返回 count_weights(n, k) 的值 all k 答案为非零。实际上，我们只是将 memoization 移动到一个显式列表中。

注意：如所写，以下代码需要 Python 3。

def count_all_weights(n):
    """
    Return frequencies of weights of all integers in [0, n],
    as a list. The kth entry in the list gives the count
    of weight-k integers in [0, n].

    Example
    -------
    >>> count_all_weights(16)
    [1, 5, 6, 4, 1]

    """
    if n == 0:
        return [1]
    else:
        wm = count_all_weights((n-1)//2)
        weights = [wm[0], *(wm[i]+wm[i+1] for i in range(len(wm)-1)), wm[-1]]
        if n % 2 == 0:
            weights[bin(n).count('1')] += 1
        return weights

一个示例调用：

>>> count_all_weights(7590)
[1, 13, 78, 286, 714, 1278, 1679, 1624, 1139, 559, 182, 35, 3]

即使对于较大的n，此功能也应该足够好：count_all_weights(10**18) 在我的机器上花费不到半毫秒。

现在二分搜索将像以前一样工作，将对 count_weights(n, k) 的调用替换为 count_all_weights(n)[k]（count_weights(2*n, k) 也是如此）。

最后，另一种可能性是将区间 [0, n] 分解为一系列越来越小的子区间，其中每个子区间的长度是 2 的幂。例如，我们将区间[0, 101] 分解为[0, 63]、[64, 95]、[96, 99] 和[100, 101]。这样做的好处是，我们可以通过计算组合轻松计算出在这些子区间中的任何一个子区间中有多少 weight-k 整数。例如，在[0, 63] 中，我们有所有可能的 6 位组合，所以如果我们追求 weight-3 整数，我们知道它们中肯定有 6-choose-3（即 20）。在[64, 95] 中，我们知道每个整数都以1-bit 开头，然后在排除1-bit 之后，我们就有了所有可能的5 位组合，所以我们再次知道这个区间有多少个整数任何给定的重量。

应用这个想法，这是一个完整、快速、一体化的功能，可以解决您的原始问题。它没有递归，也没有记忆。

def solve(m, k):
    """
    Given nonnegative integers m and k, find the smallest
    nonnegative integer n such that the closed interval
    [n+1, 2*n] contains exactly m weight-k integers.

    Note that for k small there may be no solution:
    if k == 0 then we have no solution unless m == 0,
    and if k == 1 we have no solution unless m is 0 or 1.
    """
    # Deal with edge cases.
    if k < 2 and k < m:
        raise ValueError("No solution")
    elif k == 0 or m == 0:
        return 0
    k -= 1

    # Find upper bound on n, and generate a subset of
    # Pascal's triangle as we go.
    rows = []
    high, row = 1, [1] + [0] * k
    while row[k] < m:
        rows.append((high, row))
        high, row = high * 2, [1, *(row[i]+row[i+1] for i in range(k))]

    # Bisect to find first n that works.
    low = mlow = weight = 0
    while rows:
        high, row = rows.pop()
        mmid = mlow + row[k - weight]
        if mmid < m:
            low, mlow, weight = low + high, mmid, weight + 1
    return low + 1

【讨论】：

感谢您的帮助，但我应该在 c 上执行此操作，所以我想知道它是否有任何用于此目的的记忆库？
你所需要的只是一个哈希表，而且C语言中肯定有很多哈希表的实现。我不能推荐任何特定的。或者，有一些方法可以摆脱记忆：我在我的答案中添加了一种这样的方法。消除记忆需求的另一种方法是将问题分解为不同的方法：如果您将范围 [0, n] 分解为大小递减的片段，每个片段的长度是 2 的幂，一些简单的组合可以为您计算每个片段二次幂分段。
@Reddy90：在我的答案末尾添加了一个完整的解决方案，不需要任何记忆。它应该很容易翻译成 C。