【问题标题】:Search and replace multiple specific sequences of elements in Python list/array在 Python 列表/数组中搜索和替换多个特定的元素序列
【发布时间】:2016-08-16 08:27:34
【问题描述】:

我目前有 6 个单独的 for 循环,它们遍历一个数字列表,以匹配较大序列中的特定数字序列,并像这样替换它们:

[...0,1,0...] => [...0,0,0...]
[...0,1,1,0...] => [...0,0,0,0...]
[...0,1,1,1,0...] => [...0,0,0,0,0...]

还有它们的逆:

[...1,0,1...] => [...1,1,1...]
[...1,0,0,1...] => [...1,1,1,1...]
[...1,0,0,0,1...] => [...1,1,1,1,1...]

我现有的代码是这样的:

for i in range(len(output_array)-2):
    if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 0:
        output_array[i+1] = 0

for i in range(len(output_array)-3):
    if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 1 and output_array[i+3] == 0:
        output_array[i+1], output_array[i+2] = 0

总的来说,我使用蛮力检查对相同的 output_array 进行了 6 次迭代。有更快的方法吗?

【问题讨论】:

  • 您的问题不清楚。分享一些代码和输入输出示例
  • 关于时间复杂度,我将从timeit 模块开始测量不同的实现。如果您需要有关函数花费最多时间的更多详细信息,请使用profiler。理论复杂性通常不是一个好的指南,因为它在很大程度上取决于假设的实现及其复杂性。 Python 的内置数据结构经过高度优化,可能使用与您想象的完全不同的实现。
  • 你能补充一些例子吗?例如,01011010 应该变成什么?是00011110 吗?

标签: python arrays list design-patterns iterator


【解决方案1】:
# I would create a map between the string searched and the new one.

patterns = {}
patterns['010'] = '000'
patterns['0110'] = '0000'
patterns['01110'] = '00000'

# I would loop over the lists

lists = [[0,1,0,0,1,1,0,0,1,1,1,0]]

for lista in lists:

    # i would join the list elements as a string
    string_list = ''.join(map(str,lista))

    # we loop over the patterns
    for pattern,value in patterns.items():

        # if a pattern is detected, we replace it
        string_list = string_list.replace(pattern, value)
        lista = list(string_list)
    print lista

【讨论】:

  • 我喜欢这种方法,这似乎可行且效率更高。
  • 考虑迭代 patterns.items() 以避免在循环内显式查找 patterns[pattern]
【解决方案2】:

虽然此问题与问题 HereHere 有关,但来自 OP 的问题与一次快速搜索多个序列有关。虽然接受的答案效果很好,但我们可能不希望循环遍历所有搜索序列以获取基本序列的每个子迭代。

下面是一个算法,它仅在基序列中存在 (i-1) 个整数序列时才检查 i 个整数序列

# This is the driver function which takes in a) the search sequences and 
# replacements as a dictionary and b) the full sequence list in which to search 

def findSeqswithinSeq(searchSequences,baseSequence):
    seqkeys = [[int(i) for i in elem.split(",")] for elem in searchSequences]
    maxlen = max([len(elem) for elem in seqkeys])
    decisiontree = getdecisiontree(seqkeys)
    i = 0
    while i < len(baseSequence):
        (increment,replacement) = get_increment_replacement(decisiontree,baseSequence[i:i+maxlen])
        if replacement != -1:
            baseSequence[i:i+len(replacement)] = searchSequences[",".join(map(str,replacement))]
        i +=increment
    return  baseSequence

#the following function gives the dictionary of intermediate sequences allowed
def getdecisiontree(searchsequences):
    dtree = {}
    for elem in searchsequences:
        for i in range(len(elem)):
            if i+1 == len(elem):
                dtree[",".join(map(str,elem[:i+1]))] = True
            else:
                dtree[",".join(map(str,elem[:i+1]))] = False
    return dtree

# the following is the function does most of the work giving us a) how many
# positions we can skip in the search and b)whether the search seq was found
def get_increment_replacement(decisiontree,sequence):
    if str(sequence[0]) not in decisiontree:
        return (1,-1)
    for i in range(1,len(sequence)):
        key = ",".join(map(str,sequence[:i+1]))
        if key not in decisiontree:
            return (1,-1)
        elif decisiontree[key] == True:
            key = [int(i) for i in key.split(",")]
            return (len(key),key)
    return 1, -1

你可以用这个sn-p测试上面的代码:

if __name__ == "__main__":
    inputlist = [5,4,0,1,1,1,0,2,0,1,0,99,15,1,0,1]
    patternsandrepls = {'0,1,0':[0,0,0],
                        '0,1,1,0':[0,0,0,0],
                        '0,1,1,1,0':[0,0,0,0,0],
                        '1,0,1':[1,1,1],
                        '1,0,0,1':[1,1,1,1],
                        '1,0,0,0,1':[1,1,1,1,1]}

    print(findSeqswithinSeq(patternsandrepls,inputlist))

建议的解决方案将要搜索的序列表示为决策树。

由于跳过了许多搜索点,我们应该能够使用这种方法做得比 O(m*n) 更好(其中 m 是搜索序列的数量,n 是碱基序列的长度。

编辑:根据已编辑问题的更清晰更改答案。

【讨论】:

  • “实现这一目标的最快方法” - 实现什么? OP 不清楚替换的意图,但明确表示应该将它们应用于另一个列表。
  • @ugotchi,上面的代码能满足你的需要吗?
猜你喜欢
  • 2015-12-29
  • 1970-01-01
  • 2017-02-01
  • 2018-02-15
  • 1970-01-01
  • 2021-09-21
  • 1970-01-01
  • 2020-09-27
  • 2014-03-14
相关资源
最近更新 更多