Python：将列表中项目的前n个字符与同一列表中所有其他项目的前n个字符进行比较答案

【问题标题】：Python: Compare first n characters of item in list to first n characters of all other items in same listPython：将列表中项目的前n个字符与同一列表中所有其他项目的前n个字符进行比较
【发布时间】：2019-09-07 03:21:28
【问题描述】：

我需要将列表中项目的前 n 个字符与同一列表中其他项目的前 n 个字符进行比较，然后删除或保留其中一项。

在下面的示例列表中，“AB2222_100”和“AB2222_P100”将被视为重复（即使它们在技术上是唯一的），因为前 6 个字符匹配。比较两个值时，如果 x[-4:] = "P100"，则该值将保留在列表中，而没有“P”的值将被删除。列表中的其他项目将保留，因为没有重复项，无论字符串末尾是“P100”还是“100”后缀。对于这种情况，不会有超过一个重复项（“P”或不是）。

AB1111_100
AB2222_100
AB2222_P100
AB3333_P100
AB4444_100
AB5555_P100

我了解切片和比较，但一切都假设有独特的价值。我希望使用列表理解而不是长 for 循环，但也想了解我所看到的。我在试图找出这种非独特场景的集合、集合、zip 等时迷失了方向。

切片和比较不会保留最终列表中需要维护的所需后缀。

newList = [x[:6] for x in myList]

这就是它应该如何开始和结束。

myList = ['ABC1111_P100', 'ABC2222_100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']

newList = ['ABC1111_P100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']

【问题讨论】：

正如您已经注意到的，您不能使用单个表达式来执行此操作：您比较的切片不是您要保存的字符串。将这两个概念分开，然后再试一次。您可能想在此站点上搜索更多解决方案；您可能想要维护一组“可见”的前缀。

标签： python python-2.7 duplicates comparison list-comprehension

【解决方案1】：

正如您的 cmets 所述，您不能在一个班轮中执行此操作。您可以在O(n) 时间内完成此操作，但会占用一些额外空间：

myList = ['ABC1111_P100', 'ABC2222_100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
seen = dict()

print(myList)
for x in myList:
    # grab the start and end of the string
    start, end = x.split('_')
    if start in seen: # If we have seen this value before
        if seen[start] != 'P100': # Did that ending have a P value?
            seen[start] = end # If not swap out the P value
    else:
        # If we have not seen this before then add it to our dict.
        seen[start] = end

final_list = ["{}_{}".format(key, value) for key, value in seen.items()]
print(final_list)

【讨论】：