不幸的是,这在任何标准库集容器中都无法有效地完成(优于 O(n))。
这很奇怪,因为向哈希集和二进制集添加随机选择函数非常容易。在一个不稀疏的哈希集中,您可以尝试随机条目,直到获得成功。对于二叉树,您可以在左子树或右子树之间随机选择,最多 O(log2) 步。我已经实现了下面的演示:
import random
class Node:
def __init__(self, object):
self.object = object
self.value = hash(object)
self.size = 1
self.a = self.b = None
class RandomSet:
def __init__(self):
self.top = None
def add(self, object):
""" Add any hashable object to the set.
Notice: In this simple implementation you shouldn't add two
identical items. """
new = Node(object)
if not self.top: self.top = new
else: self._recursiveAdd(self.top, new)
def _recursiveAdd(self, top, new):
top.size += 1
if new.value < top.value:
if not top.a: top.a = new
else: self._recursiveAdd(top.a, new)
else:
if not top.b: top.b = new
else: self._recursiveAdd(top.b, new)
def pickRandom(self):
""" Pick a random item in O(log2) time.
Does a maximum of O(log2) calls to random as well. """
return self._recursivePickRandom(self.top)
def _recursivePickRandom(self, top):
r = random.randrange(top.size)
if r == 0: return top.object
elif top.a and r <= top.a.size: return self._recursivePickRandom(top.a)
return self._recursivePickRandom(top.b)
if __name__ == '__main__':
s = RandomSet()
for i in [5,3,7,1,4,6,9,2,8,0]:
s.add(i)
dists = [0]*10
for i in xrange(10000):
dists[s.pickRandom()] += 1
print dists
我得到了 [995, 975, 971, 995, 1057, 1004, 966, 1052, 984, 1001] 作为输出,所以分布接缝很好。
我自己也遇到过同样的问题,但我还没有决定,这个更高效的选择所带来的性能提升是否值得使用基于 python 的集合的开销。我当然可以对其进行改进并将其翻译成 C,但今天这对我来说工作量太大了 :)