将 GpuMat 复制到 CUDA 张量

【问题标题】：Copy GpuMat to CUDA Tensor将 GpuMat 复制到 CUDA 张量
【发布时间】：2023-03-02 21:12:02
【问题描述】：

我正在尝试在 C++ 中运行模型推理。
我使用 torch.jit.trace 成功地在 Python 中跟踪模型。
我可以使用 torch::jit::load() 在 C++ 中加载模型。
我能够在 cpu 和 gpu 上执行推理，但是起点始终是 torch::from_blob 方法，这似乎是在创建 cpu 端张量。
为了提高效率，我想将cv::cuda::GpuMat 直接转换/复制到 CUDA Tensor。我一直在挖掘 pytorch tests 和 docs 以寻找方便的例子，但找不到。

问题：如何从 cv::cuda::GpuMat 创建 CUDA 张量？

【问题讨论】：

标签： c++ opencv cuda pytorch

【解决方案1】：

这是一个例子：

//define the deleter ...
void deleter(void* arg) {};

//your convert function 

cuda::GpuMat gImage;

//build or load your image here ...

std::vector<int64_t> sizes = {1, static_cast<int64_t>(gImage.channels()),
                          static_cast<int64_t>(gImage.rows),
                          static_cast<int64_t>(gImage.cols)};

long long step = gImage.step / sizeof(float);

std::vector<int64_t> strides = {1, 1, step, static_cast<int64_t>(gImage.channels())};

auto tensor_image = torch::from_blob(gImage.data, sizes, strides, deleter,  torch::kCUDA);
std::cout << "output tensor image : " << tensor_image << std::endl;

【讨论】：

谢谢！我设法运行这段代码，但有一个问题。我认为 strides 应该是：{1, 1, step, channels}。
我已经修正了不正确的步幅。见github.com/pytorch/pytorch/issues/19786