CUDA：使用单独的文件链接共享库.so答案

【问题标题】：CUDA: Linking a shared library .so using separate filesCUDA：使用单独的文件链接共享库.so
【发布时间】：2014-07-01 12:45:45
【问题描述】：

我正在尝试使用 nvcc 6.0 从单独的 .cu 文件编译 .so 库。我设法使用 -rdc=true 分别编译每个文件。当我尝试使用 c 链接我的库时，我得到了一堆错误。我已经从一个库编译。我在 nvcc 5.0 的一个问题中读到不支持 here。我进入了 nvcc 6.0 的手册，但如果是这样的话，我找不到（或理解）。 Bellow是我的makefile（我在编写makefile方面不是很有经验，所以非常欢迎任何建议）。错误是之后粘贴的

NCC = /usr/local/cuda-6.0/bin/nvcc
CC = g++

LCUDA = -L/usr/local/cuda/lib64 -lcuda -lcudart
LNUM = -lm

OOP = -arch=sm_30 -rdc=true --shared -Xcompiler -fPIC -c

all: cuda_ddm.so

cuda_ddm.so : wfpt.o stationary.o
$(CC) -Wall -shared -include ./c_cuda_ddm.h -o $@ $^ $(LCUDA) 

wfpt.o : wfpt.cu
$(NCC) $(OOP) $@ $^ 

test.o : test.cu
$(NCC) $(OOP) $@ $^

错误：

（编辑：我更改了编译器错误以解决当前情况。）

/usr/local/cuda-6.0/bin/nvcc -arch=sm_30 -rdc=true --shared -Xcompiler -fPIC -c wfpt.o wfpt.cu 
/usr/local/cuda-6.0/bin/nvcc -arch=sm_30 -rdc=true --shared -Xcompiler -fPIC -c test.o test.cu 
g++ -Wall -shared -o cuda_ddm.so wfpt.o test.o -L/usr/local/cuda/lib64 -lcuda -lcudart 
test.o: In function `big_random_block(int)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0x5e): multiple definition of `big_random_block(int)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0x5e): first defined here
test.o: In function `big_random_block_int(int)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0xde): multiple definition of `big_random_block_int(int)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0xde): first defined here
test.o: In function `value(float, float, int)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0x169): multiple definition of `value(float, float, int)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0x169): first defined here
test.o: In function `__device_stub__Z14float_to_colorPhPKf(unsigned char*, float const*)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0x788): multiple definition of `__device_stub__Z14float_to_colorPhPKf(unsigned char*, float const*)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0xb30): first defined here
test.o: In function `float_to_color(unsigned char*, float const*)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0x7f9): multiple definition of `float_to_color(unsigned char*, float const*)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0xba1): first defined here
test.o: In function `__device_stub__Z14float_to_colorP6uchar4PKf(uchar4*, float const*)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0x81e): multiple definition of `__device_stub__Z14float_to_colorP6uchar4PKf(uchar4*, float const*)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0xbc6): first defined here
test.o: In function `float_to_color(uchar4*, float const*)':
tmpxft_00003c24_00000000-3_test.cudafe1.cpp:(.text+0x88f): multiple definition of `float_to_color(uchar4*, float const*)'
wfpt.o:tmpxft_00003bc7_00000000-3_wfpt.cudafe1.cpp:(.text+0xc37): first defined here
collect2: error: ld returned 1 exit status
make: *** [cuda_ddm.so] Error 1

编辑：

为了澄清情况，我将代码更改为 100% 确保两个文件中没有重叠代码。我在 # include "c_cuda_ddm.hcu" 中都有以下内容：

# ifndef DDM_HEADER
# define DDM_HEADER

#include "book.h"
#include "math.h"

# define TOL 1e-7
# define PI 3.1415926535

# define DIM_X 0
# define DIM_Y 2
# define DIM_U 2
# define DIM_THETA 3
# define DIM_PTHETA 0

# define INDEX_V 0
# define INDEX_A 1
# define INDEX_W 2

# define CUE_LEFT 1
# define CUE_RIGHT 0
# define ANTISACCADE_TYPE 0
# define PROSACCADE_TYPE 1

// Number of threads for the predictive posterior
# define DDMBLOCKS 256
# define PPBLOCKS 1024
# define LLHBLOCKS 16 
# endif

__device__ double lp_ddm(double t, double v, double a, double w);

extern "C"
int llh_ddm(double *t, double *v, double *a, double *w, int ny,
    double *llh);


extern "C"
int llh_stationary_antisaccades(double *x, double *y, double *u,
    double *theta, double *ptheta, int ny, double *llh);

extern "C"
int lpp_stationary_antisaccades(double *x, double *y, double *u,
    double *theta, double *ptheta, int ny, int ns, double *llh);

【问题讨论】：

错误很明显 - 您已经编译了两次相同的代码，一次在 wfpt.o 中，一次在 stationary.o 中（可能是因为将相同的代码包含到两个源文件中）。该问题可能与您显示的 makefile 无关。
@talonmies 我验证了问题是否是您提到的问题。我几乎可以肯定它与此无关，但请查看我的编辑。
您的 make 文件中确实缺少设备链接步骤。当您使用-dc 编译时，您不能简单地转到g++ 将事物链接在一起。需要一个单独的中间设备链接步骤。但是，这不是此问题中指出的问题的根源。 @talonmies 响应是正确的，您的编辑没有任何改变。包含守卫不会阻止将相同的代码包含到 2 个单独的文件中。它们只防止多个包含在一个 single 文件中。需要一个简化的测试用例来为您解决所有这些问题，但您没有像 SO 期望的那样提供它。
对不起@RobertCrovella，我很慢......我不明白。我查看了这两个文件并确保没有重叠的代码（我可以将其中一个转换为另一个并且代码编译 - 假设我更改了我的 makefile）。所以我不确定为什么两个文件中的代码都是两次。我想我仍然遇到编译错误，但这可能是一个不同的错误。我将更新我收到的错误。
@eaponte：如果您只阅读错误消息，事情会简单得多。无论发生什么，您都在编译 big_random_block、big_random_block_int、value 和 float_to_color 两次。一次在 wfpt.o 和一次在 test.o 。这是无可争议的。链接器会准确地告诉您问题出在哪里。您的任何编辑都没有告诉我们发生这种情况的原因。查看您的代码并找到这些函数的定义位置。然后分析这些函数是如何编译到您尝试链接的每个目标文件中的。只有您可以诊断和解决此问题。交给你.....

标签： cuda shared-libraries nvcc

【解决方案1】：

big_random_block()、float_to_color（一个内核）以及您可能所有其他重复的定义都来自book.h。

这个头文件不同于（我认为是常见的做法）其他头文件，因为它不仅包含函数原型，还包含实际的函数定义。

因此，book.h 只能（成功/正确/安全地）包含在整个项目的单个文件（即编译单元）中。如果将其包含在多个文件中，则会在多个模块中定义相同的功能，如果您尝试将这些模块链接在一起，则会导致问题。

解决方法是只在一个文件中包含book.h，或者更好的是只获取您需要的内容并从中创建您自己的组织良好的头文件。 book.h 是一个头文件，旨在伴随CUDA by example 书。虽然我确信它适用于那本书中的所有项目，但很明显，你不能随便拿起它并随意将它洒在任何项目中。一些头文件可能会以这种方式工作。这个不会。

顺便说一句，我想重申一下，单独编译（和链接）不能通过仅编译步骤来完成（-rdc=true -c）。它还需要一个设备链接步骤。可能是如果您的两个目标文件（wfpt.o 和 stationary.o 实际上并不共享或需要任何 CUDA 符号或入口点，那么它可能并不重要。但如果模块之间存在共享的 cuda 入口点，则设备链接步骤是必要的。但是，这不是您问题的症结所在，如果最终需要，您肯定会发现此问题中描述的编译顺序不正确。

【讨论】：

我上周开始学习 CUDA，试图扩展 CUDA 中的示例并完全忽略头文件...我没有设法将该头文件视为问题的根源。 @RobertCrovella，为了执行设备链接步骤，我应该简单地与 nvcc 链接吗？
如果需要设备链接步骤，则意味着需要nvcc 来执行此操作（g++ 对设备代码或设备代码链接一无所知）。在不涵盖命令的所有细微差别的情况下，如果通常将最终组装步骤从g++ 切换到nvcc，则可以在该步骤中完成所需的任何设备链接，以及创建.so nvcc manual 涵盖了这些内容。