以下示例中如何避免代码复制？ C++ / 库达答案

【问题标题】：How avoid code replication in the following example? C++ / Cuda以下示例中如何避免代码复制？ C++ / 库达
【发布时间】：2019-04-05 16:20:21
【问题描述】：

编辑：这段代码可以工作，但看起来有很多代码复制部分，我找不到解决这个问题的方法。

在 MatrixDevice 类中，我想调用 kerne.cu 中的内核函数。我将 MatrixDevice 类简化为仅展示这个概念，我实际上是如何做到的。

从 MatricDevice 我有一些函数可以将 MatrixDevice 添加到其他 MatrixDevice 或数字，这应该适用于不同的类型，在这个例子中使用 float 和 double，这对模板应该没有问题，但我必须声明重载函数 MatrixCudaOperations extern 因为我无法将 .cu 文件包含到 .h/.cpp 文件中。

矩阵设备.h

extern void MatrixCudaOperations(const float* a, const float* b, float* result, size_t rows, size_t cols, EOperation operation);
extern void MatrixCudaOperations(const float* a, float b, float* result, size_t rows, size_t cols, EOperation operation);
extern void MatrixCudaOperations(const double* a, const double* b, double* result, size_t rows, size_t cols, EOperation operation);
extern void MatrixCudaOperations(const double* a, double b, double* result, size_t rows, size_t cols, EOperation operation);


template<class T>
class MatrixDevice{

    T* data;
    size_t rows;
    size_t cols;

    MatrixDevice& Add(const MatrixDevice &other);
    MatrixDevice& Add(T &other);
};

//Operations with MatrixDevice
//Add MatrixDevice to this
template<class T>
MatrixDevice& MatrixDevice::Add(const MatrixDevice &other){
    MatrixCudaOperations(data, other.data, data, rows, cols, EOperation::ADD);
    return *this;
} 

//Add two MatrixDevice and return the result as new MatrixDevice
template<class T>
MatrixDevice Add(const MatrixDevice &a, const MatrixDevice &b){
    MatrixDevice result(a);
    result.Add(b);
    return result;
}

//Add two MatrixDevice to result MatrixDevice
template<class T>
void Add(const MatrixDevice &a, const MatrixDevice &b, MatrixDevice &result){
    MatrixCudaOperations(a.data, b.data, result.data, a.rows, a.cols, EOperation::ADD);
}


//Operations with Number

//Add T number to this
template<class T>
MatrixDevice& MatrixDevice::Add(T &other){
    MatrixCudaOperations(data, other, data, rows, cols, EOperation::ADD);
    return *this;
} 

//Add T number to MatrixDevice and return the result as new MatrixDevice
template<class T>
MatrixDevice Add(const MatrixDevice &a, T &b){
    MatrixDevice result(a);
    result.Add(b);
    return result;
}

//Add T number with MatrixDevice to result MatrixDevice
template<class T>
void Add(const MatrixDevice &a, T &b, MatrixDevice &result){
    MatrixCudaOperations(a.data, b, result.data, a.rows, a.cols, EOperation::ADD);
}

在内核中，我声明了 MatrixCudaOperations 的重载函数，并且任何函数中的代码都是相同的。我用模板尝试了这一点，但如果我需要在 MatrixDevice 类中声明 extern，它就不起作用了。

kernel.cu

template<class T> __global__
void d_Add(const T* a, const T* b, T* result){
    //code
}

template<class T> __global__
void d_Add(const T* a, T b, T* result){
    //code
}

void MatrixCudaOperations(const float* a, const float* b, float* result, size_t rows, size_t cols, EOperation operation){
    dim3 blocksize(rows, cols);

    switch(operation){
        case ADD:
            d_Add<<<1,blocksize>>>(a, b, result);
            break;
        //other cases, subtract, multiply...
    }
}

void MatrixCudaOperations(const float* a, float b, float* result, size_t rows, size_t cols, EOperation operation){
    dim3 blocksize(rows, cols);

    switch(operation){
        case ADD:
            d_Add<<<1,blocksize>>>(a, b, result);
            break;
        //other cases, subtract, multiply...
    }
}

void MatrixCudaOperations(const double* a, const double* b, double* result, size_t rows, size_t cols, EOperation operation){
    dim3 blocksize(rows, cols);

    switch(operation){
        case ADD:
            d_Add<<<1,blocksize>>>(a, b, result);
            break;
        //other cases, subtract, multiply...
    }
}

void MatrixCudaOperations(const double* a, double b, double* result, size_t rows, size_t cols, EOperation operation){
    dim3 blocksize(rows, cols);

    switch(operation){
        case ADD:
            d_Add<<<1,blocksize>>>(a, b, result);
            break;
        //other cases, subtract, multiply...
    }
}

【问题讨论】：

为什么不起作用？如果您有编译错误，请将它们包含在问题中
这段代码可以运行，但在我看来有很多代码复制，这只是为了增加值。而且我找不到解决此代码复制的方法
我投票决定将此问题作为题外话结束，因为应该在codereview.stackexchange.com 上提问

标签： c++ templates cuda refactoring replication

【解决方案1】：

从顶部开始。

template<class T>
class MatrixDevice;

template<class T>
static T const& to_matrix_data( T const& t ) { return t; }
template<class T>
static T const* to_matrix_data( MatrixDevice<T> const& m ) { return m.data; }

template<class T, class Rhs>
void AddInto(MatrixDevice<T>& target, MatrixDevice<T> const& src, Rhs const& rhs) {
  MatrixCudaOperations(src.data, to_matrix_data<T>(rhs), target.data, EOperation::ADD );
}

template<class T>
class MatrixDevice{
  T* data;
  size_t rows;
  size_t cols;

  template<class Rhs>
  MatrixDevice& +=(const Rhs &other)& {
    AddInto( *this, *this, other );
    return *this;
  }

  template<class Rhs>
  friend MatrixDevice operator+(MatrixDevice lhs, Rhs const& rhs) {
    lhs += rhs;
    return lhs;
  }
};

对 3 个不同的操作使用单词 Add 是不好的。一个是increment by，一个是add，最后一个是add into。

于是我写了一个免费的模板函数AddInto。然后基于增量并添加它。

我的 add 最多比你的多移动一次，并且根据矩阵的内部结构，移动是免费的。

【讨论】：

谢谢！这对我有很大帮助！