【发布时间】:2020-10-07 00:21:58
【问题描述】:
我有一些基因测序数据如下:
data = [{'sequence': 'gene1__gene2__gene3', 'occurrence': 10},
{'sequence': 'gene2__gene3', 'occurrence': 5},
{'sequence': 'gene2', 'occurrence': 2},
{'sequence': 'gene4', 'occurrence': 4}
]
我想将其转换为以下(树状)dictionary 数据结构,其中任何子路径都会告诉我该组基因的共现计数:
tree_dict = {
'gene1': {'occurrence': 10, 'self': 0, 'children': {'gene2': {'occurrence': 10, 'self': 0, 'children': {'gene3': {'occurrence': 10, 'self': 10, 'children': {}}}},
'gene3': {'occurrence': 10, 'self': 0, 'children': {'gene2': {'occurrence': 10, 'self': 10, 'children': {}}}},
}
},
'gene2': {'occurrence': 17, 'self': 2, 'children': {'gene1': {'occurrence': 10, 'self': 0, 'children': {'gene3': {'occurrence': 10, 'self': 10, 'children': {}}}},
'gene3': {'occurrence': 15, 'self': 5, 'children': {'gene1': {'occurrence': 10, 'self': 10, 'children': {}}}},
}
},
'gene3': {'occurrence': 15, 'self': 0, 'children': {'gene1': {'occurrence': 10, 'self': 0, 'children': {'gene2': {'occurrence': 10, 'self': 10, 'children': {}}}},
'gene2': {'occurrence': 15, 'self': 5, 'children': {'gene1': {'occurrence': 10, 'self': 10, 'children': {}}}},
}
},
'gene4': {'occurrence': 4, 'self': 4, 'children': {}}
}
在上面的tree_dict:
-
self指的是(子)路径中仅出现节点。例如:gene3永远不会单独存在,因此self的值为 0;而gene2单独存在2次,因此self的值为2。 -
occurrence指的是(子)路径中节点作为子字符串和整体的出现。
我尝试过的代码?
当我知道解决方案必须是递归函数时,我正在尝试失败迭代方法。类似于这个问题的东西:How to transform a list into a hierarchy dict。但我无法在这个方向上取得任何进展。
【问题讨论】:
标签: python python-3.x recursion data-structures tree