【问题标题】:how to debug the error of "Program received signal SIGSEGV: Segmentation fault"如何调试“程序接收信号SIGSEGV:分段错误”的错误
【发布时间】:2019-01-27 18:25:13
【问题描述】:

我正在运行 Fortran exe,但出现错误:

 set_nml_output Echo NML values to log file only
 Trying to open namelist log dart_log.nml
 PE 0: initialize_mpi_utilities:  Running with            8  MPI processes.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

然后我尝试使用 gdb 查找,它报告了

[New LWP 9883]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Failed to read a valid object file image from memory.
Core was generated by `./filter'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00002af8e021390c in netcdf::nf90_open (
    path=<error reading variable: value requires 57959040 bytes, which is more than max-value-size>, mode=0, 
    ncid=<error reading variable: Cannot access memory at address 0x7ffe439346b0>, 
    chunksize=<error reading variable: Cannot access memory at address 0x0>, 
    cache_size=<error reading variable: Cannot access memory at address 0x7ffe43934530>, 
    cache_nelems=<error reading variable: Cannot access memory at address 0x7ffe43934528>, 
    cache_preemption=<error reading variable: Cannot access memory at address 0x7ffe439345a0>, 
---Type <return> to continue, or q <return> to quit---
    comm=<error reading variable: Cannot access memory at address 0x7ffe439345a8>, 
    info=<error reading variable: Cannot access memory at address 0x7ffe439345b0>, 
    _path=<error reading variable: Cannot access memory at address 0x7ffe439345b8>) at netcdf4_file.f90:39
39  netcdf4_file.f90: No such file or directory.
(gdb) bt
#0  0x00002af8e021390c in netcdf::nf90_open (
    path=<error reading variable: value requires 57959040 bytes, which is more than max-value-size>, mode=0, 
    ncid=<error reading variable: Cannot access memory at address 0x7ffe439346b0>, 
    chunksize=<error reading variable: Cannot access memory at address 0x0>, 
    cache_size=<error reading variable: Cannot access memory at address 0x7ffe43934530>, 
    cache_nelems=<error reading variable: Cannot access memory at address 0x7ffe43934528>, 
    cache_preemption=<error reading variable: Cannot access memory at address 0x7ffe439345a0>, 
    comm=<error reading variable: Cannot access memory at address 0x7ffe439345a8>, 
    info=<error reading variable: Cannot access memory at address 0x7ffe439345b0>, 
    _path=<error reading variable: Cannot access memory at address 0x7ffe439345b8>) at netcdf4_file.f90:39
Backtrace stopped: Cannot access memory at address 0x7ffe43934598

netcdf4_file.f90:39如下所示:

if (present(cache_size) .or. present(cache_nelems) .or. &
       present(cache_preemption)) then
     ret = nf_get_chunk_cache(size_in, nelems_in, preemption_in)
     if (ret .ne. nf90_noerr) then
        nf90_open = ret
        return
     end if
     if (present(cache_size)) then
        size_out = cache_size     #### line 39
     else
        size_out = size_in
     end if
     if (present(cache_nelems)) then
        nelems_out = cache_nelems
     else
        nelems_out = nelems_in
     end if

netcdf的版本是否与问题有关,还是需要修改一些设置?

谁能给我一些关于如何解决这个问题的建议, 因为我对这些并不熟悉。 提前致谢。

【问题讨论】:

  • 请启用编译器的所有调试选项。喜欢gfortran -g -fbacktrace -fcheck=all -Wall。最好的方法是以与调试测试相同的方式编译库,但这通常太尴尬了。通常我们需要更多的代码,见minimal reproducible example
  • 你的严重错误,至少在 GDB 中,似乎在这里:path=&lt;error reading variable: value requires 57959040 bytes, which is more than max-value-size&gt;。其余的回溯似乎是由此产生的结果。有关更多信息,请参阅this link。编辑:另外,确保您的 NetCDF 版本在编译期间配置了并行 I/O 选项。这可能是另一个可能发生此错误的途径。
  • 看起来像是对 netcdf 库的调用。您将retnf90_noerr 进行比较——这表明您正在使用Fortran90 库。但同时您为nf90_open 分配了一个值,因为它是一个变量,或者它是函数的名称——我相当肯定nf90_open 是该库中的一个函数。是否可能存在命名冲突?
  • @chw21,我想那里引用的来源是来自 NetCDF 库。
  • @DanielR.Livingston,在我设置 max-value-size unlimited 后,该行的错误会消失,但其他的不会。我需要了解更多如何调试,谢谢,你有有什么建议吗?

标签: linux fortran gfortran netcdf


【解决方案1】:

分段错误确实很难调试,但我会做以下几件事:

使用调试符号和运行时检查进行编译。这些标志取决于编译器,但以下是 gfortran 和 Intel Fortran 的标志:

gfortran     ifort         effect
------------------------------------------------------
-g           -g            Stores the code inside the binary
-O0          -O0           Disables optimisation
-fbacktrace  -traceback    More informative stack trace
-Wall        -warn all     Enable all compile time warnings
-fcheck=all  -check all    Enable run time checks

运气好的话,当你的程序这样编译后崩溃时,会更容易推断出哪里出了问题。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-06-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多