【发布时间】:2016-07-20 02:12:54
【问题描述】:
为了使事情更笼统,以gsl_integration_qag 为例(因为它需要一个工作区来管理间隔)。
这是一个特意设计的简单示例(problem.pyx):
from libc.math cimport cos
from cython.parallel import prange
from cython_gsl cimport *
cdef double mean(double[:] arr):
cdef Py_ssize_t i
cdef double sum_ = 0.0
for i in range(arr.shape[0]):
sum_ += arr[i]
return sum_ / arr.shape[0]
cdef double integrand(double x, void *params) nogil:
cdef double a = (<double*>params)[0]
cdef double b = (<double*>params)[1]
return a*x + b
def use_gsl_integration_sequential():
cdef double result, error
cdef Py_ssize_t i, j
cdef double result_sum=0.0
cdef double results[3]
cdef double a=0.0, b=0.0
cdef double da = 0.1
cdef size_t space_size = 1000
cdef gsl_function gsl_func
cdef double params[2]
gsl_func.function = &integrand
cdef gsl_integration_workspace *ws
ws= gsl_integration_workspace_alloc(space_size)
for i in range(3):
a += da
result_sum = 0.0 # reset the sum(thank J.J.Hakala for pointing out this)
for j in range(4): # here may be range(100000) in practice!
b = a + j
params[0] = a; params[1] = b
gsl_func.params = params
gsl_integration_qag(&gsl_func, 0.0, 1.0, 1e-8, 1e-8, space_size, GSL_INTEG_GAUSS15, ws, &result, &error)
result_sum += result
results[i] = result_sum
gsl_integration_workspace_free(ws)
# do some other stuff with the 'results', for example:
return mean(results)
# Now, I want to parallel the inner loop
def use_gsl_integration_parallel():
cdef double result, error
cdef Py_ssize_t i, j
cdef double result_sum=0.0
cdef double results[3]
cdef double a, b
cdef double da = 0.1
cdef size_t space_size = 1000
cdef gsl_function gsl_func
cdef double params[2]
gsl_func.function = &integrand
cdef gsl_integration_workspace *ws
ws= gsl_integration_workspace_alloc(space_size)
for i in range(3):
# a should be read-only in the follow prange block.
a += da
# here may be prange(100000,...) in practice!
for j in prange(4, nogil=True):
# b should be local private.
b = a + j
# params also should be local private
params[0] = a; params[1] = b
# gsl_func is shared, only its params differ.
# According to the gsl manual(see follow link), workspaces should be allocated on a per-thread basis, but how?
gsl_func.params = params
gsl_integration_qag(&gsl_func, 0.0, 1.0, 1e-8, 1e-8, space_size, GSL_INTEG_GAUSS15, ws, &result, &error)
result_sum += result
results[i] = result_sum
gsl_integration_workspace_free(ws)
return mean(results)
代码有点长,只是为了完整(复制),但我觉得比较简单(阅读)(๑•ᴗ•๑)
然后编译链接:
cython -a -2 problem.pyx
gcc -O3 -fopenmp -c problem.c -o problem.o -IC:\gsl2.1x86\include -IC:\python-2.7.10\include
gcc -shared -fPIC -fopenmp -o problem.pyd -LC:\gsl2.1x86\libs -LC:\python-2.7.10\libs problem.o -l:libgsl.a -l:libgslcblas.dll.a -l:libpython27.dll.a
在 IPython 中:
In [1]: from problem import *
In [2]: use_gsl_integration_sequential()
Out[2]: 7.2
use_gsl_integration_parallel()的结果是未定义的,我试了好几次,最多的时候会崩溃,运气好,得到了未定义的值,所以肯定有问题!我只是找不到这样一个平行的例子。有人能帮帮我吗?
环境:
win10-64bit、gcc (tdm-1) 5.1.0 32bit、python-2.7.10 32bit、cython0.24、gsl-2.1
一些有用的参考资料?:
https://github.com/twiecki/CythonGSL
https://www.gnu.org/software/gsl/manual/html_node/Thread_002dsafety.html#Thread_002dsafety
(对不起,无法发布更多信息,两个链接是我的限制......
【问题讨论】:
-
我认为
gsl_func也需要是本地的。我也不是 100% 确定它会在这里将result视为本地(但我不知道它不是!) -
确实,让我很困惑的是在哪里“分配”这些东西(gsl_function、params、workspace)。乍一看,这似乎很容易——容易并行,但我错了。此外,omp's cython support 相当弱且隐含(违反“显式优于隐式”)。我开始怀疑这个问题在 cython 中是否可行。事实上,任何 C 代码示例都会有所帮助(尽管我不太熟悉在 C 中使用 omp。)
-
我认为你使用了一个并行块,在那里分配它。然后在并行块内使用一个 prange。不过,我现在实际上无法测试它,所以我不能保证它是正确的。
-
@DavidW 出现了更奇怪的现象:在我原来的函数中,我添加了一个
int num_threads参数,并将其传递给prange的num_threadskwarg,翻译成C时,cython会报错:optimization.c: In function '__pyx_f_12optimization_target_func0': error: '__pyx_v_num_threads' undeclared (first use in this function) #pragma omp parallel reduction(+:__pyx_v_integrate_sum) num_threads(__pyx_v_num_threads).但我无法在上面的简单示例中重现它。这是由于您提到的嵌套块吗?我完全糊涂了,我会做更多的研究。 -
ps:如果我在函数级别添加一个局部变量,比如
num_therads = num_therads,那么 cython 不会抱怨...