【问题标题】:Subsequence match for a simple list and a list of sets?简单列表和集合列表的子序列匹配?
【发布时间】:2023-03-17 15:21:02
【问题描述】:

这是一个涉及subsequences 的问题。 普通子序列问题有一个列表l1,并询问另一个列表l2 是否是子序列。例如:

l1 = [3, 1, 4, 1, 5, 9]

l2 = [3, 1, 5]            => True (remove 4, 1, 9)

l2 = [4, 3, 5]            => False

我的 问题中,l2 具有 sets 值。例如:

l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
     {2, 3} matches 3
     {1, 5, 2} matches 1 (or 5)
     {9} matches 9

在这里,l2 可以被认为在概念上通过从每个集合中提取一个元素来扩展为不同的可能性:

l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]

这意味着只要l2 表示的六个可能列表之一匹配l1,我们就成功匹配。因为[3, 1, 9] 匹配l1 ,所以整个l2 匹配。

因此,一种解决方案可能是首先将l2 扁平化为新的l2exp,如上,然后对于l2exp 中的每个sublist_l2,可以使用ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2)

如何在不将l2 显式扩展为列表列表的情况下进行匹配?

【问题讨论】:

  • 在您的示例中,l1 匹配 l2,因为元素 [1,3][1,2,3,4] 中?如果是这样,[2,3] 也会成功,对吧?匹配顺序是否依赖?我在这里的意思是,请澄清你所说的匹配是什么意思。
  • [2,3] 当然匹配。
  • “l2 可以在概念上扩展”是什么意思?为什么[1, 3] 匹配l1?这只是不清楚
  • 子序列是指子列表中的数字按顺序排列在更大的列表中?请使用澄清匹配含义的信息编辑您的问题。也许提供不匹配的示例。

标签: python subsequence


【解决方案1】:

解决方案

不是检查l1l2 元素的相等性,而是检查l2 元素中l1 元素的membership。 p>

一些可能的实现(Try it online!):

def subseq(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

基准测试

测试

l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]

次:

5254.183 ms  subseq_product
   0.020 ms  subseq_all_any
   0.006 ms  subseq_all_not_disjoint
   0.005 ms  subseq_not_any_disjoint
   0.005 ms  subseq_map_disjoint
   0.003 ms  subseq_loops
1279.553 ms  subseq_Alain
   0.018 ms  subseq_Alain_faster
  • subseq_product 从问题中得出这个想法,遍历 l2 的所有 221 个可能性,为每个可能性做一个 ordinary subsequence check
  • subseq_Alain 尝试 220 (?) 个递归路径。 _faster 版本贪婪地获取每个 l2-元素的第一个匹配项,并且不再尝试。

尝试使用更长输入的快速解决方案:

l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]

次:

   0.285 ms  subseq_all_any
   0.058 ms  subseq_all_not_disjoint
   0.057 ms  subseq_not_any_disjoint
   0.031 ms  subseq_map_disjoint
   0.044 ms  subseq_loops
   1.488 ms  subseq_Alain_faster

完整的基准代码 (Try it online!):

from timeit import timeit
from itertools import product, repeat

def subseq_product(l1, l2):

    def is_subseq(x, y):
        # Ordinary subsequence test,
        # see https://stackoverflow.com/a/24017747
        it = iter(y)
        return all(c in it for c in x)

    return any(is_subseq(p2, l1)
               for p2 in product(*l2))

def subseq_all_any(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq_all_not_disjoint(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq_not_any_disjoint(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq_map_disjoint(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq_loops(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

def subseq_Alain(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

def subseq_Alain_faster(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0]:
            return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
    return False

def benchmark(funcs, args, number):
    for _ in range(3):
        for func in funcs:
            t = timeit(lambda: func(*args), number=number) / number
            print('%8.3f ms ' % (t * 1e3), func.__name__)
        print()

l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
    subseq_product,
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)

l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)

【讨论】:

  • 这似乎不是所要求的... any(x in pool for x in it1) 将匹配例如 [1, 5]5 甚至不在列表中,并且您不检查订单...
  • @Tomerikoo 在您的示例中,l1l2 是什么?
  • @KellyBundy 似乎工作!我正在测试更多示例。
  • @Tomerikoo,您需要指定如何从集合列表中获取 [1,5]。
  • @marlon 它是如何工作的?使用 l1 = [1, 2, 3, 4]l2 = [{5, 1}, {7, 3}] 它返回 True... 这有什么意义?
【解决方案2】:

见证版本

除了知道是否有匹配之外,人们可能还想知道什么匹配。也许是为了真正使用它,或者也许只是为了能够验证其正确性。这里有几个版本,返回实际匹配元素的列表,如果没有匹配,则返回 None。前两个可能是最好的。

也许是最简单的,假设来自l2 的每个池在l1 中确实有一个匹配项,然后捕获异常:

def subseq1(l1, l2):
    it1 = iter(l1)
    try:
        return [next(x for x in it1 if x in pool)
                for pool in l2]
    except StopIteration:
        pass

基本循环:

def subseq2(l1, l2):
    witness = []
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                witness.append(x)
                break
        else:
            return
    return witness

稍微修改我原来的if(all(any(... 解决方案以附加每个见证元素:

def subseq3(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(x in pool and not witness.append(x)
               for x in it1)
           for pool in l2):
        return witness

为下一个见证元素预先附加一个点:

def subseq4(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(witness[-1] in pool
               for witness[-1] in it1)
           for pool in l2
           if not witness.append(None)):
        return witness

所有个见证元素预先分配位置:

def subseq5(l1, l2):
    witness = [None] * len(l2)
    it1 = iter(l1)
    if all(any(witness[i] in pool
               for witness[i] in it1)
           for i, pool in enumerate(l2)):
        return witness

最后检查假证人:

def subseq6(l1, l2):
    it1 = iter(l1)
    false = object()
    witness = [next((x for x in it1 if x in pool), false)
               for pool in l2]
    if false not in witness:
        return witness

测试代码(案例取自 Alain 的回答):

funcs = [
    subseq1,
    subseq2,
    subseq3,
    subseq4,
    subseq5,
    subseq6,
]

for func in funcs:
    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 4, 3]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # False

    print()

【讨论】:

  • 这些都是很好的解决方案。
【解决方案3】:

如果您的集合列表可以包含两个以上的项目,那么递归函数可能是最好的:

def subMatch(A,S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

输出:

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 4, 3]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # False

【讨论】:

  • 可能非常慢,例如l1 = list(range(40))l2 = [{i, i+1} for i in range(0, 42, 2)] 采用more than a second
  • 同意,我更喜欢你的,虽然我花了一秒钟才明白为什么它有效。
  • 我想这有助于我多年来了解普通子序列问题的简短解决方案:-)。您可以通过向下移动递归调用来加快速度,替换 True,但似乎尝试多条路径是您首先递归执行此操作的原因?
  • 路径搜索确实是我的想法,但我意识到它对于这类问题不是很有效。
  • 现在在我的答案中做了一些基准测试,包括你的和它的贪婪版本。
猜你喜欢
  • 2018-12-18
  • 2017-01-28
  • 2014-05-31
  • 1970-01-01
  • 2014-01-22
  • 2015-05-27
  • 2016-05-10
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多