另一种针对您的特殊情况(定期间隔的 bin)的稳定 bin 方法是使用计算的键 - 这将消除每个步骤中的键搜索。
稳定搜索意味着列表中的数字顺序与输入数据中的顺序相同:
def inRegularIntervals(data, interval):
"""Sorts elements of data into bins of regular sizes.
The size of each bin is given by 'interval'."""
# init dict so keys are ordered - collection.defaultdict(list)
# would be faster - but this works for lists of a couple of
# thousand numbers if you have a quarter up to one second ...
# if random key order is ok, shorten this to d = {}
d = {k:[] for k in range(0, max(data), interval)}
for n in data:
key = n // interval # get key
key *= interval
d.setdefault(key, [])
d[key ].append(n) # add number
return d
用于随机数据:
from random import choices
data = choices(range(100), k = 50)
data.append(135) # add a bigger value to see the gapped keys
binned = inRegularIntervals(data, 25)
print(binned)
输出(\n 和空格添加):
{ 0: [19, 9, 1, 0, 15, 22, 4, 9, 12, 7, 12, 9, 16, 2, 7],
25: [25, 31, 37, 45, 30, 48, 44, 44, 31, 39, 27, 36],
50: [50, 50, 58, 60, 70, 69, 53, 53, 67, 59, 52, 64],
75: [86, 93, 78, 93, 99, 98, 95, 75, 88, 82, 79],
100: [],
125: [135], }
要对分箱列表进行就地排序,请使用
for k in binned:
binned[k].sort()
得到:
{ 0: [0, 1, 2, 4, 7, 7, 9, 9, 9, 12, 12, 15, 16, 19, 22],
25: [25, 27, 30, 31, 31, 36, 37, 39, 44, 44, 45, 48],
50: [50, 50, 52, 53, 53, 58, 59, 60, 64, 67, 69, 70],
75: [75, 78, 79, 82, 86, 88, 93, 93, 95, 98, 99],
100: [],
125: [135]}