OpenMP：如何保护数组免受竞争条件的影响答案

【问题标题】：OpenMP: how to protect an array from race conditionOpenMP：如何保护数组免受竞争条件的影响
【发布时间】：2018-03-19 02:44:23
【问题描述】：

这是对问题 36182486、41421437 和其他几个问题的跟进。我想通过使用多个处理器并行处理单个元素来加快 FEM 计算的偏度和质量矩阵的组装。这个小 MWE 展示了操作的胆量。

!! compile with gfortran -fopenmp -o FEMassembly FEMassembly.f90
Program FEMassembly
  use, intrinsic :: iso_c_binding
  implicit none
  real (c_double) :: arrayM(3,3)=reshape((/2.d0,1.d0,1.d0,1.d0,&
       &2.d0,1.d0,1.d0,1.d0,2.d0/),(/3,3/)) ! contrib from one element
  integer (c_int) :: ke,ne=4,kx,nx=6,nodes(3)
  real (c_double) :: L(6,6)
  integer (c_int) :: t(4,3)=reshape((/1,2,5,6,2,3,4,5,4,5,2,3/),(/4,3/))
  !!   first, no OMP

  do ke=1,ne ! for each triangular element
     nodes=t(ke,:)
     L(nodes,nodes)=L(nodes,nodes)+arrayM  
  end do
  print *,'L  no OMP'
  write(*,fmt="(6(1x,f3.0))")(L(kx,1:6),kx=1,nx)
  L=0

  !$omp parallel do private (nodes)
  do ke=1,ne ! for each triangular element
     nodes=t(ke,:)
!!     !$omp atomic
     L(nodes,nodes)=L(nodes,nodes)+arrayM  
!!     !$omp end atomic
  end do
  !$omp end parallel do
  print *,'L  with OMP and race'
  write(*,fmt="(6(1x,f3.0))")(L(kx,1:6),kx=1,nx)        
End Program FEMassembly

在注释掉原子指令后，数组 L 包含几个错误的值，大概是因为我试图用原子指令避免竞争条件。结果是：

L  no OMP
2.  1.  0.  1.  0.  0.
1.  6.  1.  2.  2.  0.
0.  1.  4.  0.  2.  1.
1.  2.  0.  4.  1.  0.
0.  2.  2.  1.  6.  1.
0.  0.  1. -0.  1.  2.
L  with OMP and race
2.  1.  0.  1.  0.  0.
1.  6.  1.  2.  2.  0.
0.  1.  2.  0.  2.  1.
1.  2.  0.  4.  1.  0.
0.  2.  2.  1.  6.  1.
0.  0.  1.  0.  1.  2.

如果未注释“原子”指令，编译器会返回错误：错误：!$OMP ATOMIC 语句必须在 (1) 处设置一个内在类型的标量变量其中 (1) 指向行 L(nodes,nodes) 中的 arrayM .....

我希望实现的是每个元素（这里是普通数组M）的耗时贡献并行发生，但是由于多个线程处理相同的矩阵元素，因此必须做一些事情才能使总和发生在一个有秩序的时尚。谁能建议一种方法来做到这一点？

【问题讨论】：

你的程序没有声明 kx 或 nx，也没有设置 nx。如果您说出您期望的输出结果也会有所帮助。

标签： fortran openmp race-condition

【解决方案1】：

在 Fortran 中，最简单的方法是使用归约。这是因为 OpenMP for Fortran 支持对数组进行归约。以下是我认为您正在尝试做的事情，但要加一点盐，因为

您没有提供正确的输出，因此难以测试

对于这么小的数组，有时很难找到竞争条件

!! compile with gfortran -fopenmp -o FEMassembly FEMassembly.f90
Program FEMassembly
  use, intrinsic :: iso_c_binding
   Use omp_lib, Only : omp_get_num_threads
  implicit none
  real (c_double) :: arrayM(3,3)=reshape((/2.d0,1.d0,1.d0,1.d0,&
       &2.d0,1.d0,1.d0,1.d0,2.d0/),(/3,3/)) ! contrib from one element
  integer (c_int) :: ke,ne=4,nodes(3)
  real (c_double) :: L(6,6)
  integer (c_int) :: t(4,3)=reshape((/1,2,5,6,2,3,4,5,4,5,2,3/),(/4,3/))
  ! Not declared in original program
  Integer :: nx, kx
  ! Not set in original program
  nx = Size( L, Dim = 1 )

  !$omp parallel default( none ) private ( ke, nodes ) shared( ne, t, L, arrayM )
  !$omp single
  Write( *, * ) 'Working on ', omp_get_num_threads(), ' threads'
  !$omp end single
  !$omp do reduction( +:L )
  do ke=1,ne ! for each triangular element
     nodes=t(ke,:)
     L(nodes,nodes)=L(nodes,nodes)+arrayM  
 end do
 !$omp end do
 !$omp end parallel
 write(*,fmt="(6(1x,f3.0))")(L(kx,1:6),kx=1,nx)        
End Program FEMassembly

【讨论】：

BTW 在 OpenMP 4 中也可以减少 C 和 C++ 中的数组。
干杯 - 不知道。我不喜欢 C 的另一个原因就在窗外......
@Ian Bush 非常感谢，我已经编辑了原始问题以包含结果，我已经验证您的答案给出了正确的 L 值，即使 L 的大小非常大。我留下的问题是代码的 OMP 版本比非 OMP 版本运行得慢得多。由于这是一个完全不同的问题，一旦我完成了一些测试，我会提出一个新问题。再次非常感谢