【问题标题】:C++ openMP parallel matrix multiplicationC++ openMP 并行矩阵乘法
【发布时间】:2013-05-18 17:38:24
【问题描述】:

我的 openMP 代码有什么问题?它总是只需要 1 个线程,并且与非并行版本的工作时间相同

template <typename T>
Matrix<T>* Matrix<T>::OMPMultiplication(Matrix<T>* A, Matrix<T>* B){ 

    if(A->ySize != B->xSize)
      throw;

    Matrix<T>* C = new Matrix<T>(A->xSize, B->ySize);

    sizeType i, j, k;
    T element;

    #pragma omp parallel for private(i, j)
    {
      #pragma omp for private(i, j)
      for( i = 0; i < A->xSize; i++ )
          cout<<"There are "<<omp_get_num_threads()<<" threads"<<endl;

          for(j = 0; j < B->ySize; j++){

              C->matrix[i][j] = 0;
              for(k = 0; k < A->ySize; k++){
                  C->matrix[i][j] += A->matrix[i][k] * B->matrix[k][j]; 
              }   

      }   
    }   
    return C;
}

【问题讨论】:

  • 第一个 pragma 包含“for”,但不是 for pragma(这是第二个)。
  • @VictorSand #pragma omp parallel#pragma omp for 的组合不是嵌套并行。

标签: c++ openmp


【解决方案1】:

首先,您在i 循环中缺少一些{},并且变量k 需要对i 循环的每次迭代都是私有的。但是,我认为您还混淆了 parallelfor 杂注的组合方式。要成功并行化 for 循环,您需要将其放入 parallel pragma 中,然后放入 for pragma 中。为此,您可以将代码更改为

#pragma omp parallel private(i, j, k)
{
    #pragma omp for
    for( i = 0; i < A->xSize; i++ ) {
        cout<<"There are "<<omp_get_num_threads()<<" threads"<<endl;

        for(j = 0; j < B->ySize; j++) {

            C->matrix[i][j] = 0;
            for(k = 0; k < A->ySize; k++){
                C->matrix[i][j] += A->matrix[i][k] * B->matrix[k][j]; 
            }   

        }
    }
}

或使用组合的parallel for 表示法

#pragma omp parallel for private(i, j, k)
for( i = 0; i < A->xSize; i++ ) {
    ...
}

另外,请确保您告诉 OpenMP 在此处使用超过 1 个线程。这可以通过omp_set_num_threads(&lt;number of threads here&gt;) 和设置环境变量(如OMP_NUM_THREADS)来完成。

希望你能并行化。 :)

【讨论】:

    【解决方案2】:

    使用此代码,我的 4 个内核得到了稍微快一点的结果:

        omp_set_num_threads(4);
        #pragma omp parallel for
        for (i = 0; i < n; i++) {
            for (j = 0; j < n; j++) {
                c[i] += b[j] * a[j][i];
            }
        }
    

    完整程序

    #include <stdio.h>
    #include <time.h>
    #include <omp.h>
    #include <stdlib.h>
    
    
    int main() {
        int i, j, n, a[719][719], b[719], c[719];
    
        clock_t start = clock();
    
        n = 100; //Max 719
    
        printf("Matrix A\n");
    
        for (i = 0; i < n; ++i) {
            for (j = 0; j < n; ++j) {
                a[i][j] = 10;
                printf("%d ", a[i][j]);
            }
            printf("\n");
        }
    
        printf("\nMatrix B\n");
    
    #pragma omp parallel private(i) shared(b)
        {
    #pragma omp for
            for (i = 0; i < n; ++i) {
                b[i] = 5;
                printf("%d\n", b[i]);
            }
        }
    
        printf("\nA * B\n");
    
    #pragma omp parallel private(i) shared(c)
        {
    #pragma omp for
            for (i = 0; i < n; ++i) {
                c[i] = 0;
            }
        }
    
    #pragma omp parallel private(i,j) shared(n,a,b,c)
        {
    #pragma omp for schedule(dynamic)
            for (i = 0; i < n; ++i) {
                for (j = 0; j < n; ++j) {
                    c[i] += b[j] * a[j][i];
                }
            }
        }
    
    
    #pragma omp parallel private(i) shared(c)
        {
    #pragma omp for
            for (i = 0; i < n; ++i) {
                printf("%d\n", c[i]);
            }
        }
    
        clock_t stop = clock();
        double elapsed = (double) (stop - start) / CLOCKS_PER_SEC;
        printf("\nTime elapsed: %.5f\n", elapsed);
        start = clock();
        printf("Matrix A\n");
    
        for (i = 0; i < n; ++i) {
            for (j = 0; j < n; ++j) {
                a[i][j] = 10;
                printf("%d ", a[i][j]);
            }
            printf("\n");
        }
    
        printf("\nMatrix B\n");
    
    #pragma omp parallel private(i) shared(b)
        {
    #pragma omp for
            for (i = 0; i < n; ++i) {
                b[i] = 5;
                printf("%d\n", b[i]);
            }
        }
        printf("\nA * B\n");
        omp_set_num_threads(4);
    #pragma omp parallel for
        for (i = 0; i < n; i++) {
            for (j = 0; j < n; j++) {
                c[i] += b[j] * a[j][i];
            }
        }
        stop = clock();
        elapsed = (double) (stop - start) / CLOCKS_PER_SEC;
        printf("\nTime elapsed: %.5f\n", elapsed);
        return 0;
    }
    

    第一种方法

    经过的时间:0.03442

    第二种方法

    经过的时间:0.02630

    【讨论】:

      猜你喜欢
      • 2014-05-03
      • 2017-09-14
      • 1970-01-01
      • 2018-08-24
      • 1970-01-01
      • 2012-05-30
      • 2018-12-05
      • 2021-07-20
      • 1970-01-01
      相关资源
      最近更新 更多