应该很快……
100.000.000 行系列的计时。
In [84]: s = pd.Series(np.random.choice([1,0,-1], 10**8), dtype=np.int8)
In [85]: s.shape
Out[85]: (100000000,)
模拟series.isnull():
In [86]: %timeit s==-1
87 ms ± 3.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [87]: %timeit s.values==-1
84.1 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [88]: %timeit np.where(s==-1)
546 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [89]: %timeit np.where(s.values==-1)
531 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
模拟:series.isnull().sum():
In [90]: %timeit (s==-1).sum()
1.39 s ± 38.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [91]: %timeit (s.values==-1).sum()
181 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
PS 请注意,在计算(求和)它们时,(s==-1).sum() 和 (s.values==-1).sum() 之间的差异非常明显