如何打乱列表，使每个特定长度的子列表都有唯一的项目？答案

【问题标题】：How to shuffle a list, so that every sublist of a specific length has unique items?如何打乱列表，使每个特定长度的子列表都有唯一的项目？
【发布时间】：2019-08-15 21:16:59
【问题描述】：

给定：随机打乱的唯一话语 ID 列表和说话者列表，各个话语所属（以相同顺序）

问题：如何对 uttIDList 重新排序，以便每个具有 32 个元素的子列表（从步长为 32 的第一个元素开始）包含来自不同说话者的话语？重要的是，如果我们在不同的混洗列表上重新运行算法，我们也会得到不同的子列表。

例如：

uttIDList = [1, 0, 9, 7, 100, 2, 3, 8301, ...] (length dividable by 32)

spkIDList = [0, 0, 3, 2, 1, 4, 20, 4, ...] 

sublist0 = uttIDList[0:32]

sublist1 = uttIDList[32:64]

...

sublistN = uttIDList[N-32,N]

【问题讨论】：

spkIDList 中的每个 id 会重复相同的次数吗？
为什么原始的洗牌列表很重要？是否有一些要求尽可能地保留原始的洗牌顺序？你试过什么了？这似乎是一个简单的问题：选择 32 个话语；删除重复扬声器并替换为新的选择。重复直到发言者列表是唯一的。在最后几个子列表中调整未被充分选择的演讲者。
@brandon 不，遗憾的是没有。我的猜测是，可能没有适用于所有情况的解决方案，可以从 uttIDList 中截取一部分以使其工作
对——如果某个演讲者给我们的话语比我们的子列表多，那么就没有解决方案。请澄清完整的问题并展示您目前的攻击。
当你达成一个决议时，请记得给有用的答案投票并接受最好的答案（即使你必须自己写）。这允许 Stack Overflow 归档您的问题。

标签： python python-3.x algorithm sorting shuffle

【解决方案1】：

完全忽略原始排序。制作演讲者和话语的参考列表（例如字典）。将问题旋转四分之一圈：您将从头开始，将每个说话者的话语分配到子列表中。

初始化k列表为空，其中k = N/32
检查您的话语词典：如果任何说话者的话语超过k，请删除多余的话语。
按话语数量的降序对发言者列表进行排序；这将有助于避免最终游戏问题。
对每个扬声器重复以下操作：
- 列出所有未满的子列表（即 len(sublist)
- 设 i = len(说话者的话语列表)
- 使用random.sample 对未填充的子列表进行随机抽样，大小为i
- 将说话者的话语附加到所选列表中。

这适用于大多数情况；最终可能会得到一个具有 2 个空位的子列表，而最终的发言者有 2 个要放置的话语。在现实生活中，一个简单的交换就可以解决这个异常。

这能让你继续前进吗？

【讨论】：

感谢您的回答，这绝对有帮助

【解决方案2】：

应该这样做。

# first map the id of each speak to a list of the indices that correspond to the speaker
spk_indices_map = {}
for i, spk_id in enumerate(spkIDList):
    if spk_id not in spk_indices_map:
        spk_indices_map[spk_id] = []
    spk_indices_map[spk_id].append(i)

# next shuffle the order of the indices for each speaker
# this still preserves which indices correspond to each speaker
for spk_id in spk_indices_map:
    shuffle(spk_indices_map[spk_id])

# the shuffled utterance and speaker lists with the desired properties
shuffled_uttIDList = []
shuffled_spkIDList = []

done = False
while not done:
    # while every speaker has at least one utterance not in the shuffled lists
    for spk_id in spk_indices_map:
        # add an utterance from each speaker to the shuffled lists
        if not spk_indices_map[spk_id]:
            done = True
            break
        else:
            index = spk_indices_map[spk_id].pop()
            shuffled_uttIDList.append(uttIDList[index])
            shuffled_spkIDList.append(spkIDList[index])

print(shuffled_uttIDList)
print(shuffled_spkIDList)

我们跟踪每个说话者对应的话语的所有索引。然后对于每个发言者，我们将他们的索引顺序打乱。对于按顺序排列的每个说话者，我们从打乱的索引列表中取出一个话语。

如果不是每个说话者都有相同数量的话语，则打乱列表中的最终子列表将小于所需大小并且可以忽略。

【讨论】：