【发布时间】:2017-11-23 05:32:55
【问题描述】:
我编写了一个 Python 程序并对其进行 Cythonize。 Cython (30%) 获得的加速效果并不令人满意。通过更改代码结构或 Cythonized 方式,肯定有优化它的空间。
该程序基本上是采用数字高程模型(DEM)栅格图和具有相同形状的过量水图。对于溢出水图中的每个像素,它会搜索四个相邻像素,并确定该像素是否低于周围的邻居、具有相同的水平或高于周围的邻居。基于此,它会增加像素处的水位或将其多余的水分配给具有较低海拔的邻居。代码继续执行,直到所有多余的水都散布在地面上。这是代码的 Cython 版本。
import numpy as np
cimport numpy as np
cdef unravel(np.ndarray[np.int_t, ndim = 1] idx,int shape0, int shape1):
return idx//shape0, idx%shape1
cdef find_lower_neighbours(int i, int j, np.ndarray[np.double_t, ndim=2] water_level, double friction_head_loss):
cdef double current_water_el, minlevels, deltav_total, deltav_min
current_water_el = water_level[i,j]
cdef np.ndarray[np.double_t, ndim = 1] levels = np.zeros(4, dtype = np.double)
levels[:] = water_level[i - 1, j], water_level[i, j + 1], water_level[i + 1, j], water_level[i, j - 1]
minlevels = levels.min()
if current_water_el - minlevels < 0:
return 0, minlevels, 0
elif np.absolute(current_water_el - minlevels) < 0.0001:
res = np.where(np.absolute(levels - current_water_el) < 0.0001)
return 1, res[0], 0
else:
levels = current_water_el - levels
low_values_flags = levels < 0
levels[low_values_flags] = 0
deltav_total = np.sum(levels)
deltav_min = levels[levels > 0].min()
return 2, levels / (deltav_total + deltav_min), deltav_min / (deltav_total + deltav_min)
cpdef np.ndarray[np.double_t, ndim=2] new_algorithm( np.ndarray[np.double_t, ndim=2] DEM, np.ndarray[np.double_t, ndim=2] extra_volume_map, double nodata, double pixel_area, double friction_head_loss , int von_neuman):
cdef int terminate = 0
cdef int iteration = 1
cdef double sum_extra_volume_map
index_dic_von_neuman = [[-1, 0], [0, 1], [1, 0], [0, -1]]
cdef int DEMshape0 = DEM.shape[0]
cdef int DEMshape1 = DEM.shape[1]
cdef np.ndarray[np.double_t, ndim = 2] water_levels
water_levels = np.copy(DEM)
cdef np.ndarray[np.double_t, ndim = 2] temp_water_levels
temp_water_levels = np.copy(water_levels)
cdef np.ndarray[np.double_t, ndim = 2] temp_extra_volume_map
temp_extra_volume_map = np.copy(extra_volume_map)
cdef np.ndarray[np.int_t, ndim = 2] wetcells
wetcells = np.zeros((DEMshape0,DEMshape1), dtype= np.int)
wetcells[extra_volume_map > 0] = 1
cdef np.ndarray[np.int_t, ndim = 2] temp_wetcells
temp_wetcells = np.zeros((DEMshape0,DEMshape1), dtype= np.int )
cdef double min_excess = friction_head_loss * pixel_area
cdef np.ndarray[np.int_t, ndim = 1] fdx
cdef int i, j , k, condition, have_any_dry_cells_in_neghbors
cdef double water_level_difference, n , volume_to_each_neghbour, weight, w0
if von_neuman == 0:
index_dic = index_dic_von_neuman
while terminate != 1:
fdx = np.flatnonzero(extra_volume_map > min_excess)
extra_volume_locations = unravel(fdx, DEMshape1,DEMshape1)
if not extra_volume_locations[0].size:
terminate = 1
return water_levels
for item in zip(*extra_volume_locations):
i = item[0]
j = item[1]
if DEM[i,j] == nodata:
print "warningggggg", i, j
temp_extra_volume_map[i,j] = 0.
else:
condition, wi, w0 = find_lower_neighbours(i, j, water_levels, friction_head_loss)
if condition == 0:
water_level_difference = wi - water_levels[i,j]
temp_water_levels[i,j] = wi
temp_extra_volume_map[i,j] -= water_level_difference * pixel_area
elif condition == 1:
n = len(wi)
volume_to_each_neghbour = (extra_volume_map[i, j] - friction_head_loss * pixel_area * .02)/ (n * 1. )
for itemm in wi:
temp_extra_volume_map[i + index_dic[itemm][0], j + index_dic[itemm][1]] += volume_to_each_neghbour
temp_wetcells[i + index_dic[itemm][0], j + index_dic[itemm][1]] += 1
temp_extra_volume_map[i,j] -= extra_volume_map[i,j]
temp_water_levels[i,j] += friction_head_loss * .02
elif condition == 2:
temp_wetcells[i,j] += 1
have_any_dry_cells_in_neghbors = 0 # means "no"
for k, weight in enumerate(wi):
if weight > 0:
temp_extra_volume_map[i + index_dic[k][0], j + index_dic[k][1]] += weight * (extra_volume_map[i,j])
if wetcells[i + index_dic[k][0], j + index_dic[k][1]] == 0:
have_any_dry_cells_in_neghbors = 1
temp_wetcells[i + index_dic[k][0], j + index_dic[k][1]] += 1
if have_any_dry_cells_in_neghbors == 1:
temp_water_levels[i,j] += friction_head_loss
temp_extra_volume_map[i, j] -= (1. - w0) * (extra_volume_map[i, j])
else:
temp_extra_volume_map[i, j] -= (1. - w0) * (extra_volume_map[i, j])
wetcells = np.copy(temp_wetcells)
water_levels = np.copy(temp_water_levels)
extra_volume_map = np.copy(temp_extra_volume_map)
iteration += 1
if iteration*1. %500. == 0.:
sum_extra_volume_map = np.sum(extra_volume_map)
print "iteration", iteration, "volume left =", sum_extra_volume_map
print "finished at iteration =", iteration - 1
return water_levels
这是分析的结果:
6712150 function calls in 29.005 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 29.005 29.005 <string>:1(<module>)
2017444 0.497 0.000 4.414 0.000 _methods.py:28(_amin)
289757 0.070 0.000 0.543 0.000 _methods.py:31(_sum)
289757 0.437 0.000 1.098 0.000 fromnumeric.py:1730(sum)
506055 0.203 0.000 0.705 0.000 function_base.py:1453(copy)
168685 0.140 0.000 0.300 0.000 numeric.py:859(flatnonzero)
1 0.000 0.000 0.000 0.000 test.py:22(print_dem_for_excel)
1 0.000 0.000 29.005 29.005 test.py:27(test)
289757 0.119 0.000 0.119 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 {len}
20 0.000 0.000 0.000 0.000 {map}
20 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
20 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
168685 0.106 0.000 0.106 0.000 {method 'nonzero' of 'numpy.ndarray' objects}
168685 0.054 0.000 0.054 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2307201 4.390 0.000 4.390 0.000 {method 'reduce' of 'numpy.ufunc' objects}
506056 0.501 0.000 0.501 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
1 0.000 0.000 0.000 0.000 {time.time}
1 22.488 22.488 29.005 29.005 {version3Cython.new_algorithm}
【问题讨论】:
-
不要指望 SO 做你的工作.....
-
如果我的问题被这样解释,我很抱歉。我认为通过放置整个算法,专业人士可以看到我什至不知道可以改进的领域。
-
cython -a是一个不错的首选。像无类型的index_dic_von_neuman = ...这样的东西会影响你的表现。 -
@Veedrac 我的代码中的无类型项目是 Python 对象(列表、字典等)我应该尽量避免使用它们还是有特定的方法来声明它们的类型?我在解释 -a html 文件时遇到问题,我应该学习如何使用它。感谢您的提示。
标签: python performance cython cellular-automata