这个数据结构有一个友好的名字吗？答案

【问题标题】：Is there a friendly name for this data structure?这个数据结构有一个友好的名字吗？
【发布时间】：2012-03-06 01:33:57
【问题描述】：

在使用 Python 为机器学习算法开发特征选择器时，我使用以下代码生成了一个数据结构：

# Perform set partitioning on the results
groups = []
for t in results:
    (jthName,kthName) = t
    jthGroup = -1
    kthGroup = -1

    # Just a simple list of hashes with online merging
    for idx,group in enumerate(groups):
        if jthName in group:
            jthGroup = idx
        if kthName in group:
            kthGroup = idx
    if jthGroup == kthGroup:
        if jthGroup == -1: # Implicit: "and kthGroup == -1"
            groups.append(set((jthName,kthName)))
    elif jthGroup != kthGroup:
        if kthGroup == -1:
            # Merge kthName into jthGroup
            groups[jthGroup].add(kthName)
        elif jthGroup == -1:
            # Merge jthName into kthGroup (redundant if naturally-ordered)
            groups[kthGroup].add(jthName)
        else:
            # Merge jthGroup and kthGroup, since we have a connecting pair
            merged = set()
            merged.update(groups[jthGroup])
            merged.update(groups[kthGroup])
            groups.remove(groups[jthGroup])
            groups.remove(groups[kthGroup])
            groups.append(merged)

我的输入 results 是一个元组列表{2}，groups 是一个集合列表。请注意，我的代码在这里不一定有效；它仅用于说明目的。

我的数据结构groups具有以下属性：

对于每个(jthName,kthName)：
- 如果在任何包含的集合中都找不到(jthName,kthName) 的任何元素，请在我们的集合列表中创建set((jthName,kthName))。
- 如果在一个包含的集合中恰好找到 (jthName,kthName) 之一，则将未找到的元素合并到该集合中。
- 如果(jthName,kthName) 的每个元素都位于不同的集合中，则将两个引用的集合合并为一个集合。
循环不变量：jthName 和 kthName 不能包含在多个集合中。

我对这种数据结构的理由是创建一组未知的连接节点图的平面分解，其中每个唯一元素名称是一个节点，每个唯一对是一条边。我的理由是我的图是不完整的，我需要这个视图来选择仅每个图的已知成员以输入一个算法，该算法将regressively determine 图形连接性和边的方向性（即, 数据表示的DAGs 的完整集合）。但是，我离题了。

变量groups所代表的数据结构有好记的名字吗？如果有或没有，是否有更节省时间或空间的方法来执行这种分解？

【问题讨论】：

可能更适合cstheory.stackexchange.com。我没有把它贴在那里，因为据我所知，这是一个来自未受过训练的理论家的本科水平问题。

标签： python algorithm data-structures

【解决方案1】：

我认为您正在寻找的是一种叫做 Disjoint-set data structure 的东西。

在执行 Kruskal 时经常使用它，因为如果您使用路径压缩实现不相交集数据结构，它允许您在摊销的 nlog*n（实际上小于该时间）时间内进行 n 次查找。

实现起来非常合理，我认为 wiki 页面伪代码非常适合 python。如果您需要更多帮助，请this SO question might help。

如果您使用不相交集数据结构，您的代码将如下所示：

for t in results:
   (jName, kName) = t

   union(jName, kName)

【讨论】：