CUDA学习——Chapter 2（5）GPU信息的查看

第二章

讲了这么多又复杂又恶心的东西，我们来讲点轻松的。
我们在之前的例子中，有这么三句代码不知道有没有引起你的注意：

int dev = 0;
cudaDeviceProp deviceProp;
CHECK(cudaGetDeviceProperties(&deviceProp, 0));

这个cudaDeviceProp是个什么东西呢？根据运行结果，我们可以发现这个结构体里面的某个元素是GPU的型号，那么我们通过实际例子来看看cudaDeviceProp到底能返回什么信息：
Example 2-10

#include <stdlib.h>
#include <cuda_runtime.h>
#include <stdio.h>

/*
 * Display a variety of information on the first CUDA device in this system,
 * including driver version, runtime version, compute capability, bytes of
 * global memory, etc.
 */
#define CHECK(call) \
{ \
    const cudaError_t error=call; \
    if(error!=cudaSuccess) \
      { \
        printf("Error: %s:%d, ", __FILE__, __LINE__); \
        printf("code:%d, reason: %s\n",error,cudaGetErrorString(error)); \
        exit(-10*error); \
	} \
}
int main(int argc, char **argv)
{
	printf("%s Starting...\n", argv[0]);

	int deviceCount = 0;
	cudaGetDeviceCount(&deviceCount);

	if (deviceCount == 0)
	{
		printf("There are no available device(s) that support CUDA\n");
	}
	else
	{
		printf("Detected %d CUDA Capable device(s)\n", deviceCount);
	}

	int dev = 0, driverVersion = 0, runtimeVersion = 0;
	CHECK(cudaSetDevice(dev));
	cudaDeviceProp deviceProp;
	CHECK(cudaGetDeviceProperties(&deviceProp, dev));
	printf("Device %d: \"%s\"\n", dev, deviceProp.name);

	cudaDriverGetVersion(&driverVersion);
	cudaRuntimeGetVersion(&runtimeVersion);
	printf("  CUDA Driver Version / Runtime Version          %d.%d / %d.%d\n",
		driverVersion / 1000, (driverVersion % 100) / 10,
		runtimeVersion / 1000, (runtimeVersion % 100) / 10);
	printf("  CUDA Capability Major/Minor version number:    %d.%d\n",
		deviceProp.major, deviceProp.minor);
	printf("  Total amount of global memory:                 %.2f GBytes (%llu "
		"bytes)\n", (float)deviceProp.totalGlobalMem / pow(1024.0, 3),
		(unsigned long long)deviceProp.totalGlobalMem);
	printf("  GPU Clock rate:                                %.0f MHz (%0.2f "
		"GHz)\n", deviceProp.clockRate * 1e-3f,
		deviceProp.clockRate * 1e-6f);
	printf("  Memory Clock rate:                             %.0f Mhz\n",
		deviceProp.memoryClockRate * 1e-3f);
	printf("  Memory Bus Width:                              %d-bit\n",
		deviceProp.memoryBusWidth);

	if (deviceProp.l2CacheSize)
	{
		printf("  L2 Cache Size:                                 %d bytes\n",
			deviceProp.l2CacheSize);
	}

	printf("  Max Texture Dimension Size (x,y,z)             1D=(%d), "
		"2D=(%d,%d), 3D=(%d,%d,%d)\n", deviceProp.maxTexture1D,
		deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1],
		deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1],
		deviceProp.maxTexture3D[2]);
	printf("  Max Layered Texture Size (dim) x layers        1D=(%d) x %d, "
		"2D=(%d,%d) x %d\n", deviceProp.maxTexture1DLayered[0],
		deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0],
		deviceProp.maxTexture2DLayered[1],
		deviceProp.maxTexture2DLayered[2]);
	printf("  Total amount of constant memory:               %lu bytes\n",
		deviceProp.totalConstMem);
	printf("  Total amount of shared memory per block:       %lu bytes\n",
		deviceProp.sharedMemPerBlock);
	printf("  Total number of registers available per block: %d\n",
		deviceProp.regsPerBlock);
	printf("  Warp size:                                     %d\n",
		deviceProp.warpSize);
	printf("  Maximum number of threads per multiprocessor:  %d\n",
		deviceProp.maxThreadsPerMultiProcessor);
	printf("  Maximum number of threads per block:           %d\n",
		deviceProp.maxThreadsPerBlock);
	printf("  Maximum sizes of each dimension of a block:    %d x %d x %d\n",
		deviceProp.maxThreadsDim[0],
		deviceProp.maxThreadsDim[1],
		deviceProp.maxThreadsDim[2]);
	printf("  Maximum sizes of each dimension of a grid:     %d x %d x %d\n",
		deviceProp.maxGridSize[0],
		deviceProp.maxGridSize[1],
		deviceProp.maxGridSize[2]);
	printf("  Maximum memory pitch:                          %lu bytes\n",
		deviceProp.memPitch);

	exit(EXIT_SUCCESS);
}

运行结果为：
CUDA学习——Chapter 2（5）GPU信息的查看
逐个信息来分析
1.cudaGetDeviceCount，返回所有支持CUDA的GPU的数量，在本机上仅有GTX 1050。
2.deviceProp.name，GPU的名字，若有多个GPU，按照GPU的索引值返回。
3.cudaDriverGetVersion，返回CUDA驱动的版本，在本机CUDA驱动的版本已经到达10.0
4.cudaRuntimeGetVersion，返回CUDA运行库的版本，在本机CUDA运行库的版本为9.2
5.deviceProp.major,deviceProp.minor，该GPU可运行的CUDA运行库的最低主版本号和次版本号，GTX 1050可运行CUDA的最低版本为6.1
6.deviceProp.totalGlobalMem，GPU的显存大小（也就是我们常说的独显空间）。
7.deviceProp.clockRate，GPU时钟频率。
8.deviceProp.memoryclockRate，GPU内存峰值时钟频率。
9.deviceProp.memoryBusWidth，GPU的global memory的总线位宽。
10.deviceProp.l2CacheSize，GPU的L2缓存大小（若存在）。
11.deviceProp.maxTexture1D，GPU支持的一维纹理的最大大小。
12.deviceProp.maxTexture2D，GPU支持的二维纹理的最大大小。
13.deviceProp.maxTexture3D，GPU支持的三维纹理的最大大小。
14.deviceProp.totalConstMem，GPU上可用的常量内存总量。
15.deviceProp.sharedMemPerBlock，GPU上一个块最多可用的共享内存大小。
16.deviceProp.regsPerBlock，GPU上一个块最多可用的寄存器的个数。
17.deviceProp.warpSize，一个线程束包含的线程数量，在实际运行中，线程块会被分割成更小的线程束(warp)，线程束中的每个线程都将在不同数据上执行相同的命令。
18.deviceProp.maxThreadsPerMultiProcessor，一个多处理器（网格）最多可以拥有的线程数。
19.deviceProp.maxThreadsPerBlock，一个块最多可以拥有的线程数。
20.deviceProp.maxThreadsDim，一维上最多可拥有的线程数。
21.deviceProp.maxGridSize，一维上最多可拥有的块数。
22.deviceProp.memPitch，在内存拷贝中允许的最大pitch数。

如何选择最优的GPU呢（若设备存在多个GPU）

int numDevices=0;
cudaGetDeviceCount(&numDevices);
if(numDevices>1)
{
    int maxMultiprocessors=0, maxDevice=0;
    for(int device=0;device<numDevices;device++)
    {
        //开始轮询每个设备的多处理器个数
        cudaDevicProp props;
        cudaGetDeviceProperties(&props,device);
        if(maxMultiprocessors<props.multiProcessorCount)
        {
            maxMultiprocessors=props.multiProcessorCount;
            maxDevice=device;
        }
    }
    cudaSetDevice(maxDevice);//寻找拥有最多个多处理器的显卡进行计算。