【问题标题】:Does libtorch leak?libtorch 会泄漏吗?
【发布时间】:2021-10-08 12:10:23
【问题描述】:

我在我的项目中使用 pytorch c++ 库 (libtorch)。使用 valgrind 时,似乎有些内存没有正确释放。

Main.cpp:

int main() {
    // Do somenting with libtorch here...
    std::cout << "end of main" << std::endl;
    return EXIT_SUCCESS;
}

Valgrind 命令: valgrind --leak-check=full ./myapp

Valgrind 输出:

==385785== Memcheck, a memory error detector
==385785== Copyright (C) 2002-2017, and GNU GPL d, by Julian Seward et al.
==385785== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==385785== Command: ./btai
==385785== 
==385785== Warning: set address range perms: large range [0x48f5000, 0x17ecd000) (defined)
end of main
==385785== Conditional jump or move depends on uninitialised value(s)
==385785==    at 0x8181EE8: torch::jit::deregisterOperator(c10::FunctionSchema const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x8361AB5: torch::jit::(anonymous namespace)::RegistrationListener::onOperatorDeregistered(c10::OperatorHandle const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x583ACAC: c10::Dispatcher::deregisterDef_(c10::OperatorHandle const&, c10::OperatorName const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x58733D4: c10::RegisterOperators::~RegisterOperators() (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x1827F15D: __cxa_finalize (cxa_finalize.c:83)
==385785==    by 0x574C6E2: ??? (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x4011F5A: _dl_fini (dl-fini.c:138)
==385785==    by 0x1827EA26: __run_exit_handlers (exit.c:108)
==385785==    by 0x1827EBDF: exit (exit.c:139)
==385785==    by 0x1825C0B9: (below main) (libc-start.c:342)
==385785== 
==385785== Conditional jump or move depends on uninitialised value(s)
==385785==    at 0x81820FD: torch::jit::deregisterOperator(c10::FunctionSchema const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x8361AB5: torch::jit::(anonymous namespace)::RegistrationListener::onOperatorDeregistered(c10::OperatorHandle const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x583ACAC: c10::Dispatcher::deregisterDef_(c10::OperatorHandle const&, c10::OperatorName const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x58733D4: c10::RegisterOperators::~RegisterOperators() (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x1827F15D: __cxa_finalize (cxa_finalize.c:83)
==385785==    by 0x574C6E2: ??? (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x4011F5A: _dl_fini (dl-fini.c:138)
==385785==    by 0x1827EA26: __run_exit_handlers (exit.c:108)
==385785==    by 0x1827EBDF: exit (exit.c:139)
==385785==    by 0x1825C0B9: (below main) (libc-start.c:342)
==385785== 
==385785== 
==385785== HEAP SUMMARY:
==385785==     in use at exit: 724,686 bytes in 11,651 blocks
==385785==   total heap usage: 481,322 allocs, 469,671 frees, 59,519,540 bytes allocated
==385785== 
==385785== 256 bytes in 1 blocks are possibly lost in loss record 11,294 of 11,400
==385785==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==385785==    by 0xB7CB7A3: mm_account_ptr_by_tid..0 (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0xB7CAE79: mkl_serv_malloc (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x9AB5B86: mkl_serv_domain_get_max_threads (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x5791278: at::init_num_threads() (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x8E1C68B: at::native::(anonymous namespace)::min_all_kernel_impl(at::Tensor&, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x5B19042: at::native::min(at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x633CD3B: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper__min>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x6154390: at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const [clone .isra.165] (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x615DA3E: at::redispatch::min(c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x7AC1310: torch::autograd::VariableType::(anonymous namespace)::min(c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x7AC17EE: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::min>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785== 
==385785== 69,664 bytes in 1 blocks are possibly lost in loss record 11,400 of 11,400
==385785==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==385785==    by 0xB7CBA27: mm_account_ptr_by_tid..0 (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0xB7CAE79: mkl_serv_malloc (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x9AB5B86: mkl_serv_domain_get_max_threads (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x5791278: at::init_num_threads() (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x8E1C68B: at::native::(anonymous namespace)::min_all_kernel_impl(at::Tensor&, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x5B19042: at::native::min(at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x633CD3B: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper__min>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x6154390: at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const [clone .isra.165] (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x615DA3E: at::redispatch::min(c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x7AC1310: torch::autograd::VariableType::(anonymous namespace)::min(c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785==    by 0x7AC17EE: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::min>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (in /home/tmac3/Desktop/Branching_Time_Active_Inference/libs/torch/lib/libtorch_cpu.so)
==385785== 
==385785== LEAK SUMMARY:
==385785==    definitely lost: 0 bytes in 0 blocks
==385785==    indirectly lost: 0 bytes in 0 blocks
==385785==      possibly lost: 69,920 bytes in 2 blocks
==385785==    still reachable: 654,766 bytes in 11,649 blocks
==385785==                       of which reachable via heuristic:
==385785==                         stdstring          : 359,526 bytes in 4,879 blocks
==385785==         suppressed: 0 bytes in 0 blocks
==385785== Reachable blocks (those to which a pointer was found) are not shown.
==385785== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==385785== 
==385785== Use --track-origins=yes to see where uninitialised values come from
==385785== For lists of detected and suppressed errors, rerun with: -s
==385785== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)

如您所见,在 main 函数之后存在 valgrind 错误和内存泄漏。它们似乎来自libtorch,这些错误来自哪里,以及如何摆脱它们?

【问题讨论】:

  • "// 在这里使用 libtorch 做一些事情..." 这似乎是值得研究的有趣部分:-/
  • 项目安静大,我不能全部展示。该代码主要创建张量并对它们进行应用操作(不使用反向传播)。
  • 使用 libtorch 创建最少的代码 (init/deinit),然后逐步添加内容,直到遇到问题。
  • 在展示研究的时候指定 libtorch 版本会更好。
  • 请注意,valgrind 只报告“可能的”泄漏,而不是确定的泄漏。因此,libtorch 完全有可能(甚至很可能)没有泄漏,至少不会以任何方式在实践中造成问题。关于“条件跳转或移动取决于未初始化的值”的警告更令人担忧,因为它们可能表明存在未定义的行为,但这是与内存泄漏不同的问题

标签: c++ memory-leaks valgrind libtorch


【解决方案1】:

据我们所知,libTorch torch::jit::Module 在移动设备上泄漏内存:加载 TorchCcript 文件后,您无法释放 torch::jit::Module 的内存。

【讨论】:

    【解决方案2】:

    我们在::init_num_threads() 完全相同的位置遇到了同样的泄漏问题。似乎泄漏可能发生在某些 libtorch 代码上,但最终到达 at::init_num_threads()。 我们希望您在这个问题上做了一些处理。

    【讨论】:

    • 我想知道问题是否来自 valgrind,它可能无法在多线程应用程序中完美地跟踪内存...(纯假设)
    • 我在 libtorch 推断结束时添加了 mkl_free_buffers()。它解决了上述问题,但发生了新的可能泄漏......``` ==50790== by 0x59C0821: ideep::tensor::reorder_to(ideep::tensor&, ideep::attr_t const&) const (in /opt/ e2e-streamingASR/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)```
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-04-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-03-23
    相关资源
    最近更新 更多