比较列表中的字符串？答案

【问题标题】：Comparing strings in a list?比较列表中的字符串？
【发布时间】：2023-04-06 09:48:01
【问题描述】：

我在一个列表中有一组字符串，该列表在下面的代码中给出。我想将每个字符串与其以前的字符串进行比较。显然，第一个定位的字符串不会与前一个字符串进行比较，因为没有任何字符串。逻辑基本上是：第二个定位的字符串与第一个定位的字符串进行比较，第 3 个定位字符串与第 1 个和第 2 个定位字符串进行比较， ... ...

s = ["avocado", "banana", "carrot", "avocado", "carrot", "grapes", "orange"]
for i in range(2,len(s)):
    for j in range(i,2, -1):
        if s[i] == s[j]:
            print (s[i])

现在，如果找到匹配项，将显示带有位置的字符串名称。如avocado found in position 4 and 1。我被困在这段代码中。我应该如何进行？

【问题讨论】：

["avocado", "avocado", "avocado"] 列表的预期输出是什么？

标签： python list

【解决方案1】：

另一种方法是您可以将项目的字典制作到位置

from collections import defaultdict

d = defaultdict(list)
for i in range(len(s)):
    d[s[i]].append(i + 1) # '+ 1' since we count from 1-index

for item, positions in d.items():
    if len(positions) > 1:
        print("{} found at positions {}".format(item, positions))

【讨论】：

所有建议解决方案中的最佳解决方案。

【解决方案2】：

s = ["avocado", "banana", "carrot", "avocado", "carrot", "grapes", "orange"]
for i in range(1, len(s)):
    for j in range(i):
        if s[i] == s[j]:
            print(s[i])

你很亲密。使用 range(i) 从 0 计数到 i。使用索引 1 来获取列表中的第二项（列表从 0 开始）。

【讨论】：

嗨迈克尔，谢谢。不过我有一个疑问，range(i) 会与之前的字符串还是整个列表进行比较？例如，第 3 个定位字符串“carrot”应该与其前两个字符串而不是整个列表进行比较。请清除我的疑问。
是的。当外循环中i为2时，j会从0循环到1（范围是排他的）

【解决方案3】：

这似乎适合一般用例：

a = ['a','b','c','d','e','a','b','e','d']

for i in list(set(a)):
    b = [j for j, e in enumerate(a) if e == i]
    if len(b) > 1:
        print(i," found in positions:",b )

输出：

b  found in positions: [1, 6]
a  found in positions: [0, 5]
d  found in positions: [3, 8]
e  found in positions: [4, 7]

【讨论】：

嗨，Celius，非常感谢。我有一个愚蠢的问题。上面的代码是从第 0 个位置开始搜索的吧？由于我想从第一个位置（'b'）搜索它，我不应该使用range 吗？因为我得到的输出从 a: a 在位置找到：[0, 5] b 在位置找到：[1, 6] e 在位置找到：[4, 7] d 在位置找到：[3, 8]
是的，也不是。从技术上讲，它使用列表的set 来查找列表a 中重复多次的项目，扫描所有项目（从位置0 开始，直到最后一个）。对于那些重复多次的，我们会显示这些项目的位置。在您的示例中，您从位置开始，但您在位置 0 和 3 打印了鳄梨（4 和 1 是错误的，因为列表从 0 而不是 1 开始）。
感谢您的解释。但问题是我不想在整个列表中搜索每个字符串。我只想比较当前字符串的先前字符串。这是我想应用的逻辑。
你最终想要达到什么目标？实现相同结果似乎是一种相当困难的方法？
搜索完整列表将花费大量时间来查找每个字符串。为了降低搜索复杂性，我试图实现一个逻辑，其中当前字符串只会与其先前的字符串进行比较，并且如果在某某位置找到。如果相同的字符串在第 20 位之后重复，那么我们将输出 3 个匹配的字符串，这些字符串将在一段时间后打印出来。但是，是的，这就是逻辑。

【解决方案4】：

首先，您必须确定所有单词的位置，例如在字典里。然后你可以打印带有位置的单词：

from collections import defaultdict
positions = defaultdict(list)
s = ["avocado", "banana", "carrot", "avocado", "carrot", "grapes", "orange"]
for position, word in enumerate(s):
    positions[word].append(position)

for word, position in positions.items():
    print(f"{word} found at positions {' and '.join(position)}")

【讨论】：

【解决方案5】：

您可以使用 itertools 中的组合函数来有效地遍历所有对：

s = ["avocado", "banana", "carrot", "avocado", "carrot", "grapes", "orange"]
from itertools import combinations
matches = [(w1,p1,p2) for (p1,w1),(p2,w2) in combinations(enumerate(s),2) if w1==w2]
print("\n".join(f"{word} found in position {p1} and {p2}" for word,p1,p2 in matches))


# avocado found in position 0 and 3
# carrot found in position 2 and 4

【讨论】：

f-strings 的格式化和使用非常棒！但是，如果有超过 2 个匹配项怎么办？
根据 OP 的措辞，我假设每对都将单独报告（但每对只报告一次），这样就可以了。在任何情况下，如果需要，matches 变量将允许其他使用/报告类型。