【问题标题】:How to compare elements of a list in a Python map's value, and check to see if at least n number of elements match?如何比较 Python 映射值中列表的元素,并检查是否至少有 n 个元素匹配?
【发布时间】:2019-05-24 21:32:51
【问题描述】:

我想遍历映射的值并比较列表的元素以查看是否至少 3 个元素以相同的顺序匹配,然后返回一个列表,其中包含与条件匹配的键。

prefs = {
        's1': ["a", "b", "c", "d", "e"],
        's2': ["c", "d", "e", "a", "b"],
        's3': ["a", "b", "c", "d", "e"],
        's4': ["c", "d", "e", "b", "e"],
        's5': ["c", "d", "e", "a", "b"]
    }

这是一个示例地图。在此示例中,键 s1 和 s3 在列表值中至少有三个元素匹配“a”、“b”、“c”。所以 s1 和 s3 应该像这样返回 s1 -- s3。同样 s2 和 s4 匹配,因此也应该返回,但是 s2 有多个匹配项,因为它也与 s5 匹配,所以 s2 -- s5 应该返回。我想为列表中的每个键值对返回所有可能的匹配项。 返回输出应该是这样的:

[[s1--s3], [s2--s4], [s2--s5], [s4--s5]]

我无法弄清楚如何迭代地图中的每个值,但这里有一个元素比较的 sn-p。我想知道是否可以设置一个计数器,并检查 match_cnt > 3 是否然后返回列表中的键。

a = ["a", "b", "c", "d", "e"]
b = ["a", "c", "b", "d", "e"]
match_cnt = 0

if len(a) == len(b):
    for i in range(len(a)):
        if a[i] == b[i]:
            print(a[i], b[i])

另外,想了解一下这个算法的运行时间。 完整的代码解决方案将不胜感激。 有人建议我打开一个新问题here

【问题讨论】:

标签: python python-3.x


【解决方案1】:

您可以使用.items() 遍历地图,然后它只是使用切片匹配前 3 个列表项:

prefs = {
    's1': ["a", "b", "c", "d", "e"],
    's2': ["c", "d", "e", "a", "b"],
    's3': ["a", "b", "c", "d", "e"],
    's4': ["c", "d", "e", "b", "e"],
    's5': ["c", "d", "e", "a", "b"]
}

results = []
for ki, vi in prefs.items():
    for kj, vj in prefs.items():
        if ki == kj:  # skip checking same values on same keys !
            continue

        if vi[:3] == vj[:3]:  # slice the lists to test first 3 characters
            match = tuple(sorted([ki, kj]))  # sort results to eliminate duplicates
            results.append(match)

print (set(results))  # print a unique set

返回:

set([('s1', 's3'), ('s4', 's5'), ('s2', 's5'), ('s2', 's4')])

编辑:
要检查所有可能的组合,您可以使用 itertools 中的 combination()。 iCombinations/jCombinations 保留长度为 3 个列表项的顺序:

from itertools import combinations

prefs = {
    's1': ["a", "b", "c", "d", "e"],
    's2': ["c", "d", "e", "a", "b"],
    's3': ["a", "b", "c", "d", "e"],
    's4': ["c", "d", "e", "b", "e"],
    's5': ["c", "d", "e", "a", "b"]
}

results = []
for ki, vi in prefs.items():
    for kj, vj in prefs.items():
        if ki == kj:  # skip checking same values on same keys !
            continue

        # match pairs from start
        iCombinations = [vi[n:n+3] for n in range(len(vi)-2)]
        jCombinations = [vj[n:n+3] for n in range(len(vj)-2)]

        # match all possible combinations
        import itertools
        iCombinations = itertools.combinations(vi, 3)
        jCombinations = itertools.combinations(vj, 3)

        if any([ic in jCombinations for ic in iCombinations]):  # checking all combinations
            match = tuple(sorted([ki, kj]))
            results.append(match)

print (set(results))  # print a unique set

这会返回:

set([('s1', 's3'), ('s2', 's5'), ('s3', 's5'), ('s2', 's3'), ('s2', 's4'), ('s1', 's4'), ('s1', 's5'), ('s3', 's4'), ('s4', 's5'), ('s1', 's2')])

【讨论】:

  • 此解决方案仅检查前三个元素是否匹配,然后返回集合,但我想返回列表中连续三次匹配元素的任何位置的键。所以,例如,我可以有一个 's1': ["c", "d", "a", "b", "c"] 和 's3': ["d", "c", "a ", "b", "c"],这应该是匹配的,因为我有三个连续的元素在两个列表中都匹配。有没有办法修改列表理解以检查整个列表以查看 3 个元素是否连续匹配。
  • @Sumanth M,添加了 match3 对组合:['a', 'b', 'c'], ['b', 'c', 'd'], ['c' , 'd', 'e'] 在 s1 的情况下。
  • 感谢您提供易于遵循的代码。但是可以修改上面的代码,使其返回至少三个元素匹配的匹配项,但元素不必连续排列。它们可以是任何给定的顺序。例如:s1: [a,b,c,d,e] 将与 s2: [e,a,f,b,q] 匹配。因为 [b,a,e] 在两个键中都很常见。我正在查看 jcombinations,看看是否可以调整,你能插话吗?
  • 当然可以,看看itertools。特别是itertools.combinations(),创建所有可能的组合。 itertools.combinations(vi, 3) ...
  • 你可能想 zu 提出一个新问题,cmets 不是修改代码的正确位置 :)
【解决方案2】:

我尽量详细。这应该是一个示例,您通常可以通过插入大量 print 消息来创建正在发生的事情的日志来解决此类问题。

prefs = {
    's1': ["a", "b", "c", "d", "e"],
    's2': ["c", "d", "e", "a", "b"],
    's3': ["a", "b", "c", "d", "e"],
    's4': ["c", "d", "e", "b", "e"],
    's5': ["c", "d", "e", "a", "b"]
}

# Get all items of prefs and sort them by key. (Sorting might not be
# necessary, that's something you'll have to decide.)
items_a = sorted(prefs.items(), key=lambda item: item[0])

# Make a copy of the items where we can delete the processed items.
items_b = items_a.copy()

# Set the length for each compared slice.
slice_length = 3

# Calculate how many comparisons will be necessary per item.
max_shift = len(items_a[0][1]) - slice_length

# Create an empty result list for all matches.
matches = []

# Loop all items
print("Comparisons:")
for key_a, value_a in items_a:
    # We don't want to check items against themselves, so we have to
    # delete the first item of items_b every loop pass (which would be
    # the same as key_a, value_a).
    del items_b[0]
    # Loop remaining other items
    for key_b, value_b in items_b:
        print("- Compare {} to {}".format(key_a, key_b))
        # We have to shift the compared slice
        for shift in range(max_shift + 1):
            # Start the slice at 0, then shift it
            start = 0 + shift
            # End the slice at slice_length, then shift it
            end = slice_length + shift
            # Create the slices
            slice_a = value_a[start:end]
            slice_b = value_b[start:end]
            print("  - Compare {} to {}".format(slice_a, slice_b), end="")
            if slice_a == slice_b:
                print(" -> Match!", end="")
                matches += [(key_a, key_b, shift)]
            print("")

print("Matches:")
for key_a, key_b, shift in matches:
    print("- At positions {} to {} ({} elements), {} matches with {}".format(
        shift + 1, shift + slice_length, slice_length, key_a, key_b))

哪些打印:

Comparisons:
- Compare s1 to s2
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'a']
  - Compare ['c', 'd', 'e'] to ['e', 'a', 'b']
- Compare s1 to s3
  - Compare ['a', 'b', 'c'] to ['a', 'b', 'c'] -> Match!
  - Compare ['b', 'c', 'd'] to ['b', 'c', 'd'] -> Match!
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
- Compare s1 to s4
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'b']
  - Compare ['c', 'd', 'e'] to ['e', 'b', 'e']
- Compare s1 to s5
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'a']
  - Compare ['c', 'd', 'e'] to ['e', 'a', 'b']
- Compare s2 to s3
  - Compare ['c', 'd', 'e'] to ['a', 'b', 'c']
  - Compare ['d', 'e', 'a'] to ['b', 'c', 'd']
  - Compare ['e', 'a', 'b'] to ['c', 'd', 'e']
- Compare s2 to s4
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
  - Compare ['d', 'e', 'a'] to ['d', 'e', 'b']
  - Compare ['e', 'a', 'b'] to ['e', 'b', 'e']
- Compare s2 to s5
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
  - Compare ['d', 'e', 'a'] to ['d', 'e', 'a'] -> Match!
  - Compare ['e', 'a', 'b'] to ['e', 'a', 'b'] -> Match!
- Compare s3 to s4
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'b']
  - Compare ['c', 'd', 'e'] to ['e', 'b', 'e']
- Compare s3 to s5
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'a']
  - Compare ['c', 'd', 'e'] to ['e', 'a', 'b']
- Compare s4 to s5
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
  - Compare ['d', 'e', 'b'] to ['d', 'e', 'a']
  - Compare ['e', 'b', 'e'] to ['e', 'a', 'b']
Matches:
- At positions 1 to 3 (3 elements), s1 matches with s3
- At positions 2 to 4 (3 elements), s1 matches with s3
- At positions 3 to 5 (3 elements), s1 matches with s3
- At positions 1 to 3 (3 elements), s2 matches with s4
- At positions 1 to 3 (3 elements), s2 matches with s5
- At positions 2 to 4 (3 elements), s2 matches with s5
- At positions 3 to 5 (3 elements), s2 matches with s5
- At positions 1 to 3 (3 elements), s4 matches with s5

目前还不清楚,你的输出到底应该是什么。但是,我认为将上述代码转换为您的需求不会有任何问题。

【讨论】:

  • 完美 - 谢谢
  • 我已经更新了问题。是否可以遍历所有给定键的列表类型值以检查第一个值的至少一个元素是否存在于第二个值中。让我们假设每个值至少有一个元素,我只想检查是否至少有一个元素匹配。我已经尝试过交集功能,但没有运气
猜你喜欢
  • 2022-01-22
  • 1970-01-01
  • 2017-03-15
  • 2017-03-14
  • 2021-12-13
  • 2020-11-20
  • 1970-01-01
  • 2022-01-24
  • 1970-01-01
相关资源
最近更新 更多