【发布时间】:2017-05-08 23:24:56
【问题描述】:
我在将 OpenMP 代码转换为 TBB 时遇到了一些困难。有人可以帮我吗?
我在 OpenMP 中有如下代码,结果还不错
# pragma omp parallel \
shared ( b, count, count_max, g, r, x_max, x_min, y_max, y_min ) \
private ( i, j, k, x, x1, x2, y, y1, y2 )
{
# pragma omp for
for ( i = 0; i < m; i++ )
{
for ( j = 0; j < n; j++ )
{
//cout << omp_get_thread_num() << " thread\n";
x = ( ( double ) ( j - 1 ) * x_max
+ ( double ) ( m - j ) * x_min )
/ ( double ) ( m - 1 );
y = ( ( double ) ( i - 1 ) * y_max
+ ( double ) ( n - i ) * y_min )
/ ( double ) ( n - 1 );
count[i][j] = 0;
x1 = x;
y1 = y;
for ( k = 1; k <= count_max; k++ )
{
x2 = x1 * x1 - y1 * y1 + x;
y2 = 2 * x1 * y1 + y;
if ( x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2 )
{
count[i][j] = k;
break;
}
x1 = x2;
y1 = y2;
}
if ( ( count[i][j] % 2 ) == 1 )
{
r[i][j] = 255;
g[i][j] = 255;
b[i][j] = 255;
}
else
{
c = ( int ) ( 255.0 * sqrt ( sqrt ( sqrt (
( ( double ) ( count[i][j] ) / ( double ) ( count_max ) ) ) ) ) );
r[i][j] = 3 * c / 5;
g[i][j] = 3 * c / 5;
b[i][j] = c;
}
}
}
}
TBB 版本比 OpenMP 慢 10 倍
TBB 的代码是:
tbb::parallel_for ( int(0), m, [&](int i)
{
for ( j = 0; j < n; j++)
{
x = ( ( double ) ( j - 1 ) * x_max
+ ( double ) ( m - j ) * x_min )
/ ( double ) ( m - 1 );
y = ( ( double ) ( i - 1 ) * y_max
+ ( double ) ( n - i ) * y_min )
/ ( double ) ( n - 1 );
count[i][j] = 0;
x1 = x;
y1 = y;
for ( k = 1; k <= count_max; k++ )
{
x2 = x1 * x1 - y1 * y1 + x;
y2 = 2 * x1 * y1 + y;
if ( x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2 )
{
count[i][j] = k;
break;
}
x1 = x2;
y1 = y2;
}
if ( ( count[i][j] % 2 ) == 1 )
{
r[i][j] = 255;
g[i][j] = 255;
b[i][j] = 255;
}
else
{
c = ( int ) ( 255.0 * sqrt ( sqrt ( sqrt (
( ( double ) ( count[i][j] ) / ( double ) ( count_max ) ) ) ) ) );
r[i][j] = 3 * c / 5;
g[i][j] = 3 * c / 5;
b[i][j] = c;
}
}
});
【问题讨论】:
-
TBB 的默认分区器是
auto_partitioner,它执行递归工作细分到每个线程一个循环迭代的级别,这可能会导致巨大的开销。for工作共享结构与许多编译器的默认调度是static,因此您应该为parallel_for算法提供static_partitioner的单例实例,以便在TBB 中与在OpenMP 中具有相同的工作分配。跨度> -
我已经用改变了parallel_for
-
tbb::parallel_for(tbb::blocked_range2d
(0, m, 0, n) , [&](tbb::blocked_range2d s, static_partitioner()) { for( int i = s.rows().begin(); i
标签: multithreading parallel-processing openmp intel tbb