使用 cython 加速 python 代码答案

【问题标题】：Speeding up python code with cython使用 cython 加速 python 代码
【发布时间】：2023-03-21 02:10:01
【问题描述】：

我有一个函数，它基本上只是对一个简单定义的哈希函数进行大量调用，并测试它何时找到重复项。我需要用它做很多模拟，所以希望它尽可能快。我正在尝试使用 cython 来做到这一点。 cython 代码当前使用正常的 Python 整数列表调用，其值范围为 0 到 m^2。

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls   
def h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) %m    
def floyd(inputx):
    dupefound, nohashcalls = (0,0)
    m = len(inputx)
    loops = int(m*math.log(m))
    for loopno in xrange(loops):
        if (dupefound == 1):
            break
        a = random.randrange(m)
        b = random.randrange(m)
        c = random.randrange(m)
        d = random.randrange(m)
        pos = random.randrange(m)
        value = inputx[pos]
        listofpos = [0] * m
        listofpos[pos] = 1
        setofvalues = set([value])
        cyclelimit = int(math.sqrt(m))
        for j in xrange(cyclelimit):
            pos = h3(a,b, c,d, m, inputx[pos])
            nohashcalls += 1    
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

如何转换 inputx 和 listofpos 以使用 C 类型数组并以 C 速度访问数组？我可以使用其他加速吗？ setofvalues 可以加速吗？

所以有一些东西可以比较，50 次 m = 5000 的 floyd() 调用目前在我的计算机上大约需要 30 秒。

更新：示例代码 sn-p 显示如何调用 floyd。

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

【问题讨论】：

你有没有想过添加一种机制来记忆过去的结果？我看到了对 hash 方法的重叠调用的可能性，这可以在牺牲内存空间的情况下大大加快您的算法。
你的意思是存储h3的结果吗？该功能一旦找到重复项就会停止，因此这似乎没有帮助。我怀疑主要的加速将来自使用 C 类型的数组，但我不知道该怎么做。
floyd 的确切输入是什么？我假设只是一个list 的整数？
这里是一个例子。米= 5000 。 inputx = random.sample(xrange(m**2), m) 。 (dupefound, nohashcalls) = edcython.floyd(inputx) .

标签： python optimization cython

【解决方案1】：

首先，您似乎必须在函数内部键入变量。 A good example of it is here.

其次，cython -a，对于“注释”，为您提供了对 cython 编译器生成的代码的非常出色的分解，并以颜色编码指示了它的脏度（阅读：python api 重度）。在尝试优化任何东西时，此输出非常重要。

第三，working with Numpy 上的著名页面解释了如何以 C 风格快速访问 Numpy 数组数据。不幸的是，它冗长而烦人。不过我们很幸运，因为最近的 Cython 提供了Typed Memory Views，它既易于使用又很棒。在尝试做任何其他事情之前阅读整个页面。

大约十分钟后，我想出了这个：

# cython: infer_types=True

# Use the C math library to avoid Python overhead.
from libc cimport math
# For boundscheck below.
import cython
# We're lazy so we'll let Numpy handle our array memory management.
import numpy as np
# You would normally also import the Numpy pxd to get faster access to the Numpy
# API, but it requires some fancier compilation options so I'll leave it out for
# this demo.
# cimport numpy as np

import random

# This is a small function that doesn't need to be exposed to Python at all. Use
# `cdef` instead of `def` and inline it.
cdef inline int h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) % m

# If we want to live fast and dangerously, we tell cython not to check our array
# indices for IndexErrors. This means we CAN overrun our array and crash the
# program or screw up our stack. Use with caution. Profiling suggests that we
# aren't gaining anything in this case so I leave it on for safety.
# @cython.boundscheck(False)
# `cpdef` so that calling this function from another Cython (or C) function can
# skip the Python function call overhead, while still allowing us to use it from
# Python.
cpdef floyd(int[:] inputx):
    # Type the variables in the scope of the function.
    cdef int a,b,c,d, value, cyclelimit
    cdef unsigned int dupefound = 0
    cdef unsigned int nohashcalls = 0
    cdef unsigned int loopno, pos, j

    # `m` has type int because inputx is already a Cython memory view and
    # `infer-types` is on.
    m = inputx.shape[0]

    cdef unsigned int loops = int(m*math.log(m))

    # Again using the memory view, but letting Numpy allocate an array of zeros.
    cdef int[:] listofpos = np.zeros(m, dtype=np.int32)

    # Keep this random sampling out of the loop
    cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32)

    for loopno in range(loops):
        if (dupefound == 1):
            break

        # From our precomputed array
        a = randoms[loopno, 0]
        b = randoms[loopno, 1]
        c = randoms[loopno, 2]
        d = randoms[loopno, 3]
        pos = randoms[loopno, 4]

        value = inputx[pos]

        # Unforunately, Memory View does not support "vectorized" operations
        # like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here.
        for j in range(m):
            listofpos[j] = 0

        listofpos[pos] = 1
        setofvalues = set((value,))
        cyclelimit = int(math.sqrt(m))
        for j in range(cyclelimit):
            pos = h3(a, b, c, d, m, inputx[pos])
            nohashcalls += 1
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

这里没有没有在docs.cython.org 上解释的技巧，这是我自己学习的地方，但有助于看到它们融合在一起。

对原始代码的最重要更改是在 cmets 中，但它们都相当于向 Cython 提供有关如何生成不使用 Python API 的代码的提示。

顺便说一句：我真的不知道为什么infer_types 默认不启用。它让编译器尽可能隐式使用 C 类型而不是 Python 类型，这意味着您的工作量更少。

如果您对此运行 cython -a，您会看到调用 Python 的唯一行是您对 random.sample 的调用，以及构建或添加到 Python set()。

在我的机器上，您的原始代码在 2.1 秒内运行。我的版本在 0.6 秒内运行。

~~下一步是让 random.sample 退出该循环，但我将把它留给你。~~

我已编辑我的答案以演示如何预先计算 rand 样本。这会将时间缩短到 0.4 秒。

【讨论】：

谢谢，这真的很有帮助。 a、b、c、d 变量需要在 for 循环的每次迭代中重新采样，因此无法预先计算，但也许可以使用 C 的 rand() 代替。
是的，但是m 是固定的，您知道您需要每个loops 样本。我可能会使用 numpy.random.randint(0, m, size=loops) 然后只是索引到它。
一些基准测试还表明cython.boundscheck(False) 没有加快任何速度，因此为了安全起见，我将其注释掉。在正常情况下，您确实需要边界检查。仅当您的代码完成并经过测试后才将其关闭，即便如此，也只有在基准测试时它具有真正的影响时。
谢谢。我也在 cython 0.15 中尝试过你的代码，但它似乎不理解在 0.16 中似乎是新的 [:] 符号。
是的，我认为它们是一个相对较新的功能。它们是首选，因为它们只要求传入数据实现 PEP 3118 中指定的缓冲区，但如果您仅使用 numpy 数组，则可以使用 numpy api 并完成它。 cdef int[:] foo = ... 变为 cdef np.ndarray(int, ndim=1) foo = ...

【解决方案2】：

您需要使用这种特殊的散列算法吗？为什么不对字典使用内置的散列算法？例如：

from collections import Counter
cnt = Counter(inputx)
dupes = [k for k, v in cnt.iteritems() if v > 1]

【讨论】：

我愿意。事实上，我将使用一些自制的哈希函数。