【问题标题】:Difference of C type and compile-time type in CythonCython中C类型和编译时类型的区别
【发布时间】:2014-05-03 01:01:24
【问题描述】:

我是 Cython 的新手,正在尝试学习如何将它与 numpy 一起使用来加速代码。我一直在关注link 中的教程。

我在这里复制了他们的代码:

from __future__ import division
import numpy as np


# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# The builtin min and max functions works with Python objects, and are
# so very slow. So we create our own.
#  - "cdef" declares a function which has much less overhead than a normal
#    def function (but it is not Python-callable)
#  - "inline" is passed on to the C compiler which may inline the functions
#  - The C type "int" is chosen as return type and argument types
#  - Cython allows some newer Python constructs like "a if x else b", but
#    the resulting C file compiles with Python 2.3 through to Python 3.0 beta.
cdef inline int int_max(int a, int b): return a if a >= b else b
cdef inline int int_min(int a, int b): return a if a <= b else b
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
cimport cython
@cython.boundscheck(False)
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indendation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int s, t
    cdef unsigned int x, y, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in range(xmax):
        for y in range(ymax):
            s_from = int_max(smid - x, -smid)
            s_to = int_min((xmax - x) - smid, smid + 1)
            t_from = int_max(tmid - y, -tmid)
            t_to = int_min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = <unsigned int>(x - smid + s)
                    w = <unsigned int>(y - tmid + t)
                    value += g[<unsigned int>(smid - s), <unsigned int>(tmid - t)] * f[v, w]
            h[x, y] = value
    return h

有一件事我不明白。我知道cdef 从这个link 定义了一个关于 Cython 语言基础的 C 类型。但是,上面的示例还定义了一个名为 np.int_t 的编译时类型,例如,在 cdef DTYPE_t value 的行中,DTYPE_t 实际上是 np.int_t

我的问题是:np.intnp.int_t 有什么区别?它类似于 python intctypes.c_int,但更具体于 numpy?那么,如果我简单地使用cdef int而不是cdef np.int_t,会不会一样?

另外,我确实测试了如果将cdef DTYPE_t value 替换为cdef int value 会发生什么。结果显示两者没有区别。

这是原版cdef DTYPE_t value 1 个循环,最好的 10 个:每个循环 93.9 毫秒

这是修改后的cdef int value 1 个循环,10 个循环中的最佳:每个循环 93.8 毫秒

任何帮助将不胜感激。谢谢!

【问题讨论】:

    标签: python numpy cython


    【解决方案1】:

    np.int 是一个 Python 对象,它在 Python 代码中引用整数 dtypenp.int_t 是仅存在于 Cython 中的 C typedef。 (我相信它对应于 C long,而不是 int。)

    【讨论】:

    • 非常感谢!所以cdef np.int_t 相当于说cdef long,对吗?
    • 我想我主要是因为在示例代码中,他使用了cdef int s, t,然后是cdef np.int_t value。我想知道这两种符号之间是否存在功能差异,或者只是他试图展示做同一件事的不同方式。
    • 差不多,是的。与 NumPy 一起使用时,首选 NumPy 类型以防止混淆。
    猜你喜欢
    • 2014-12-21
    • 1970-01-01
    • 2016-10-04
    • 2011-11-01
    • 2012-02-02
    • 2010-11-20
    • 1970-01-01
    • 1970-01-01
    • 2011-06-30
    相关资源
    最近更新 更多