在没有循环的情况下替换列表中的相同元素答案

【问题标题】：Replace identical elements in a list without loop在没有循环的情况下替换列表中的相同元素
【发布时间】：2021-10-22 01:22:08
【问题描述】：

我正在尝试用新字符串替换列表中的所有相同元素，并且还试图摆脱对所有内容使用循环。

# My aim is to turn:
list = ["A", "", "", "D"]
# into:
list = ["A", "???", "???", "D"]
# but without using a for-loop

我从不同的理解开始：

# e.g. 1
['' = "???"(i) for i in list]
# e.g. 2
list = [list[i] .replace '???' if ''(i) for i in range(len(lst))]

然后我尝试使用 Python 的 map 函数，如 here:

list[:] = map(lambda i: "???", list)
# I couldn't work out where to add the '""' to be replaced.

最后我杀了third solution：

list[:] = ["???" if ''(i) else i for i in list]

我觉得我离明智的攻击线越来越远了，我只想要一个整洁的方式来完成一个简单的任务。

【问题讨论】：

这能回答你的问题吗？ In-place replacement of all occurrences of an element in a list in python
是的，谢谢，但是我也有很多新颖的解决方案来回答我的问题，包括正确使用 python 的 map 函数的解决方案。
注意：列表理解 is 实际上是一个 for 循环...
@PierreD 是人类阅读更快还是更简洁？
另外：请不要将list 重新定义为变量。

标签： python list list-comprehension

【解决方案1】：

如果您想避免使用标题所暗示的循环，可以使用np.where 代替列表理解，并且对于大型数组来说更快：

data = np.array(["A", "", "", "D"], dtype='object')
index = np.where(data == '')[0]
data[index] = "???"
data.tolist()

结果：

['A', '???', '???', 'D']

速度测试

for rep in [1, 10, 100, 1000, 10000]:
    data = ["A", "", "", "D"] * rep
    print(f'array of length {4 * rep}')
    print('np.where:')
    %timeit data2 = np.array(data, dtype='object'); index = np.where(data2 == '')[0]; data2[index] = "???"; data2.tolist()
    print('list-comprehension:')
    %timeit ['???' if i == '' else i for i in data]

结果：

array of length 4
np.where:
The slowest run took 11.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 10.7 µs per loop
list-comprehension:
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 487 ns per loop
array of length 40
np.where:
The slowest run took 7.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 13 µs per loop
list-comprehension:
100000 loops, best of 5: 2.99 µs per loop
array of length 400
np.where:
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 31 µs per loop
list-comprehension:
10000 loops, best of 5: 26 µs per loop
array of length 4000
np.where:
1000 loops, best of 5: 225 µs per loop
list-comprehension:
1000 loops, best of 5: 244 µs per loop
array of length 40000
np.where:
100 loops, best of 5: 2.27 ms per loop
list-comprehension:
100 loops, best of 5: 2.63 ms per loop

对于长度超过 4000 的数组，np.where 更快。

【讨论】：

这是短名单最慢的方法之一；对于 OP 问题的四元素列表，每个循环需要 7.89 µs ± 237 ns，比简单的列表理解慢 23.8 倍。对于大型列表（尚未达到np.array），相对差异会减小；它逐渐稳定到慢 1.9 倍左右。
@PierreD 查看更新后的帖子；对于大型数组，此方法更快
您使用了错误的列表理解。我建议的是[e or '???' for e in data]。在%timing 的循环中，这最终比np.where 快1.9 倍：np.where: 1.83 ms ± 1.43 µs; list comprehension: 959 µs ± 735 ns。在写评论之前，我已经测试了多达 1 亿个随机元素。这就是为什么我对 np.where 断言 1.9 倍的渐近加速。
你说的错是什么意思？我比较的列表理解是 OP 标题所暗示的相同元素的解决方案（并且可以在其他答案中看到）。你的只适用于空元素。
你使用了%timeit ['???' if i == '' else i for i in data]。这仅替换空元素，就像这里的大多数答案一样。对于空元素的情况，我建议使用[e or '' for e in data]，它比np.array 和np.where 快28 倍到1.9 倍。这就是为什么我说你使用了错误的列表理解。至于删除重复项，我的答案的其他部分解决了这个问题。我注意到它似乎是迄今为止唯一的答案。

【解决方案2】：

为了简洁（如果我们允许列表推导，这是一种循环形式）。此外，正如@ComteHerappait 正确指出的那样，这是用'???' 替换空字符串，与问题示例一致。

>>> [e or '???' for e in l]
['A', '???', '???', 'D']

如果我们专注于替换重复元素，那么：

seen = set()
newl = ['???' if e in seen or seen.add(e) else e for e in l]
>>> newl
['A', '', '???', 'D']

最后，以下替换列表中的所有重复项：

from collections import Counter

c = Counter(l)
newl = [e if c[e] < 2 else '???' for e in l]
>>> newl
['A', '???', '???', 'D']

【讨论】：

这对于删除空字符串非常有效，但我认为问题是关于 duplicates。
你是对的；问题含糊不清，请参阅我的评论。
只是 FWIW，这个更新的答案响应了 OP 问题的所有情况：替换空字符串、替换重复项（从第一个重复项开始）或替换 all 重复。列表推导（第一个代码 sn-p）也是迄今为止最快的解决方案，无论是短列表还是长列表。

【解决方案3】：

这个怎么样：-

myList = ['A', '', '', 'D']
myMap = map(lambda i: '???' if i == '' else i, myList)
print(list(myMap))

...将导致：-

['A', '???', '???', 'D']

【讨论】：

这看起来很像我的解决方案
你是对的。我们显然是在巧合地写代码

【解决方案4】：

你可以试试这个：

list1 = ["A", "", "", "D"]

list2=list(map(lambda x: "???" if not x else x,list1))

print(list2)

这是上述版本的更长版本：

list1 = ["A", "", "", "D"]
def check_string(string):
    if not string:
        return "???"
    return string

list2=list(map(check_string,list1))
print(list2)

利用 "" 字符串为 False 值这一事实，您可以使用隐式布尔值并分别返回该值。输出：

['A', '???', '???', 'D']

【讨论】：

【解决方案5】：

您可以使用列表推导，但您要做的是比较每个元素，如果匹配则替换为不同的字符串，否则只保留原始元素。

>>> data = ["A", "", "", "D"]
>>> ['???' if i == '' else i for i in data]
['A', '???', '???', 'D']

【讨论】：

这可行，但包含一个明确的“for”循环，这是 OP 想要避免的
@DarkKnight 你认为map 在幕后做了什么；) 对于这个问题没有不涉及显式或隐式循环的解决方案