有many ways 可以做到这一点。如果你使用 numpy,你可以使用 np.count_nonzero:
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([1, 2, 3, 7, 4, 6, 5, 8, 9])
>>> a != b
array([False, False, False, True, True, False, True, False, False], dtype=bool)
>>> np.count_nonzero(a != b)
3
请注意,a != b 返回一个包含真假的 数组,具体取决于条件在每个索引处的计算方式。
这是速度比较:
>>> %timeit np.count_nonzero(a != b)
The slowest run took 40.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 752 ns per loop
>>> %timeit sum(i != j for i, j in zip(a, b))
The slowest run took 5.86 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 18.5 µs per loop
缓存掩盖了时间,但40.59 * 0.752 = 30.52µs,而5.86 * 18.5 = 108.41µs,所以numpy的最慢似乎仍然比纯python最慢的运行要快得多。
使用更大的数组会更清楚:
>>> n = 10000
>>> a = np.arange(n)
>>> b = np.arange(n)
>>> k = 50
>>> ids = np.random.randint(0, n, k)
>>> a[ids] = 0
>>> ids = np.random.randint(0, n, k)
>>> b[ids] = 0
>>> %timeit np.count_nonzero(a != b)
The slowest run took 20.50 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.5 µs per loop
>>> %timeit sum(i != j for i, j in zip(a, b))
100 loops, best of 3: 15.6 ms per loop
差异更加明显,numpy 最多 235 micro-秒,而纯python 需要15.6 milli-秒平均!