Python：父子层次结构的组合答案

【问题标题】：Python: Combinations of parent-child hierarchyPython：父子层次结构的组合
【发布时间】：2015-01-31 08:54:07
【问题描述】：

对于子-父关系表 (csv)，我正在尝试使用表中的所有数据收集可能的父子关系组合链。我正在尝试解决一个问题，即如果存在多个子父项（参见第 3 行和第 4 行），则迭代中不包含第二个子父项组合（第 4 行）。

数据示例：

孩子，父母

A,B
A,C
B,D
B,C
C,D

预期的连锁结果：

D|B|A
D|C|B|A
D|C|A

实际链结结果：

D|B|A
D|C|A

代码

find= 'A' #The child for which the code should find all possible parent relationships
sequence = ''
with open('testing.csv','r') as f:     #testing.csv = child,parent table (above example)
    for row in f:
        if row.strip().startswith(find):
            parent = row.strip().split(',')[1]
            sequence = parent + '|' + find
            f1 = open('testing.csv','r')
            for row in f1:
                if row.strip().startswith(parent):
                    parent2 = row.strip().split(',')[1]
                    sequence = parent2 + '|' + sequence
                    parent = parent2
        else:
            continue
        print sequence

【问题讨论】：

我不明白这一点：I am trying against a problem where if multiple sub-parents exist (see rows 3 & 4), the second sub-parent combination (row 4) is not included in the iteration - 但您按预期列出了D|C|B|A。如果排除第 4 行：B|C 对，我认为不会出现这样的结果。
当前代码不考虑第 4 行。这正是问题所在。
作为旁白的莎拉，看着你的个人资料，你问了很多问题，人们已经回答了。如果有人提供了您认为可以接受的答案，您应该点击他们答案旁边的复选标记accept。到目前为止，您还没有接受任何内容。

标签： python for-loop iteration hierarchy

【解决方案1】：

你看过this精彩的文章吗？真正理解python中的模式是必不可少的阅读。您的问题可以被认为是一个图问题 - 查找关系基本上是查找从子节点到父节点的所有路径。

由于可能存在任意数量的嵌套（child->parent1->parent2...），因此您需要一个递归解决方案来查找所有路径。在您的代码中，您有 2 个 for 循环 - 正如您发现的那样，最多只会导致 3 级路径。

下面的代码改编自上面的链接，以解决您的问题。函数find_all_paths 需要一个图形作为输入。

让我们根据您的文件创建图表：

graph = {} # Graph is a dictionary to hold our child-parent relationships.
with open('testing.csv','r') as f:
    for row in f:
        child, parent = row.split(',')
        graph.setdefault(parent, []).append(child)

print graph

使用您的示例，应打印：

{'C': ['A', 'B'], 'B': ['A'], 'D': ['B', 'C']}

以下代码直接来自论文：

def find_all_paths(graph, start, end, path=[]):
    path = path + [start]
    if start == end:
        return [path]

    if not graph.has_key(start):
        return []

    paths = []

    for node in graph[start]:
        if node not in path:
            newpaths = find_all_paths(graph, node, end, path)
            for newpath in newpaths:
                paths.append(newpath)
    return paths

for path in find_all_paths(graph, 'D', 'A'):
    print '|'.join(path)

输出：

D|B|A
D|C|A
D|C|B|A

【讨论】：

感谢 vikramis 的文章。这段代码正是我想要的。

【解决方案2】：

我不确定这是否是最有效的方法（但在每一行再次读取文件会更糟）。

find= 'A' #The child for which the code should find all possible parent relationships
sequences = set(find)

# we'll build up a chain for every relationship, then strip out un-needed ones later
with open('testing.csv','r') as f:     #testing.csv = child,parent table (above example)
    for row in f:
        child, parent = row.strip().split(',')
        sequences.add(parent + '|' + child)
        for c in sequences.copy():  
            if c[0] == child:
                sequences.add(parent + '|' + c)


# remove any that don't end with our child:
sequences = set(s for s in sequences if s.endswith(find))

# get all shorter chains when we have a longer one
extra = set()
for g1 in sequences:
    for g2 in sequences:
        if g2[2:] == g1:
            extra.add(g1)

# remove the shorter chains
sequences.difference_update(extra)

for chain in sequences:
    print(chain)

结果：

D|C|A
D|C|B|A
D|B|A

【讨论】：