以root身份运行时出现分段错误？答案

【问题标题】：Segmentation fault when run as root?以root身份运行时出现分段错误？
【发布时间】：2011-09-08 04:27:33
【问题描述】：

当我从计算机以 root 身份运行时，我的 c++ 程序给了我一个 seg 错误，但当我启动远程会话时却没有。我的程序仅以用户身份从我的计算机上运行。可能是什么问题？我为嵌入式设备编写了我的程序，我正在使用它来编译：

gcc -Werror notify.cc -o notify `pkg-config --libs --cflags gtk+-2.0 hildon-notifymm hildonmm hildon-fmmm'

我没有收到任何错误。会不会是标志问题？我可以发布我的代码。

编辑：当我用 gdb 启动我的程序时，我得到了这个：

Program received signal SIGSEGV, Segmentation fault.
0x40eed060 in strcmp () from /lib/libc.so.6
0x40eed060 <strcmp+0>:  ldrb    r2, [r0], #1

回溯给出这个：

(gdb) backtrace
 #0  0x40eed060 in strcmp () from /lib/libc.so.6
 #1  0x40b7f190 in dbus_set_g_error ()
 from /usr/lib/libdbus-glib-1.so.2
 #2  0x40b7d060 in dbus_g_bus_get () from /usr/lib/libdbus-glib-1.so.2
 #3  0x400558ec in notify_init () from /usr/lib/libnotify.so.1
 #4  0x4004a240 in Notify::init(Glib::ustring const&) ()
 from /usr/lib/libnotifymm-1.0.so.7
 #5  0x40033794 in Hildon::notify_init(Glib::ustring const&) ()
 from /usr/lib/libhildon-notifymm-1.0.so.1

这是我的代码：

#include <hildonmm.h>
#include <hildon-notifymm.h>
#include <hildon/hildon-notification.h>
#include <libnotifymm/init.h>
#include <gtkmm/stock.h>
#include <dbus/dbus.h>
#include <dbus/dbus-glib.h>
#include <dbus/dbus-glib-lowlevel.h>
#include <iostream>

int main(int argc, char *argv[])
{
// Initialize gtkmm and maemomm:

Hildon::init();
Hildon::notify_init("Notification Example");

// Initialize D-Bus (needed by hildon-notify):
DBusConnection* conn = dbus_bus_get(DBUS_BUS_SESSION, NULL);
dbus_connection_setup_with_g_main(conn, NULL);

// Create a new notification:
Glib::RefPtr<Hildon::Notification> notification =   Hildon::Notification::create("Something Happened", "A thing has just happened.", Gtk::Stock::OPEN);

// Show the notification:
std::auto_ptr<Glib::Error> ex;
notification->show(ex);
if(ex.get())
{ 
std::cerr << "Notification::show() failed: " << ex->what() << std::endl;
}
return 0;
}

编辑：问题已解决。程序在终端的环境中需要一个 DBUS_SESSION_ADDRESS。

【问题讨论】：

标签： c++ linux gdb maemo

【解决方案1】：

问题是您在某处调用了未定义的行为。未定义的行为在不同的机器上可能表现不同，在同一台机器上的不同运行等等。你必须找到让野指针发生的地方并处理它。

当您作为受限用户运行时，您很可能只是“幸运”，并且您的进程的页面权限设置为允许您获得的任何无效内存访问，或者您有一些特定于 root 的代码'仅在用户模式下运行时无法到达。

【讨论】：

【解决方案2】：

您可能希望在valgrind 下运行您的程序。我写了一个小程序，它在分配的数组之外写入：

$ valgrind ./segfault
==11830== Memcheck, a memory error detector
==11830== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11830== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==11830== Command: ./segfault
==11830== 
==11830== Invalid write of size 1
==11830==    at 0x4004BF: main (in /tmp/segfault)
==11830==  Address 0x7feff65bf is not stack'd, malloc'd or (recently) free'd
==11830== 
==11830== 
==11830== Process terminating with default action of signal 11 (SIGSEGV)
==11830==  Access not within mapped region at address 0x7FEFF65BF
==11830==    at 0x4004BF: main (in /tmp/segfault)
==11830==  If you believe this happened as a result of a stack
==11830==  overflow in your program's main thread (unlikely but
==11830==  possible), you can try to increase the size of the
==11830==  main thread stack using the --main-stacksize= flag.
==11830==  The main thread stack size used in this run was 8388608.
==11830== 
==11830== HEAP SUMMARY:
==11830==     in use at exit: 0 bytes in 0 blocks
==11830==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==11830== 
==11830== All heap blocks were freed -- no leaks are possible
==11830== 
==11830== For counts of detected and suppressed errors, rerun with: -v
==11830== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Segmentation fault

这个输出最重要的部分在这里：

==11830== Invalid write of size 1
==11830==    at 0x4004BF: main (in /tmp/segfault)

write of size 1 可能会帮助您确定涉及哪条线路：

int main(int argc, char *argv[]) {
    char f[1];
    f[-40000]='c';
    return 0;
}

另一个非常有用的工具是gdb。如果您将 rlimits 设置为允许转储核心（有关限制的详细信息，请参阅 setrlimit(2)，有关 ulimit 内置命令的详细信息，请参阅您的 shell 手册（可能是 bash(1)），那么您可以获得核心文件与gdb一起使用：

$ ulimit -c 1000
$ ./segfault 
Segmentation fault (core dumped)
$ gdb --core=core ./segfault
GNU gdb (GDB) 7.2-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/segfault...(no debugging symbols found)...done.
[New Thread 11951]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libc.so.6...Reading symbols from /usr/lib/debug/lib/libc-2.12.1.so...done.
done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.12.1.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004004bf in main ()
(gdb) bt
#0  0x00000000004004bf in main ()
(gdb) quit

根据程序的大小，您可能需要让更多的1000 块给允许的核心文件。如果这个程序非常复杂，那么了解调用链以获取段错误可能是至关重要的信息。

【讨论】：

【解决方案3】：

没有看到任何代码很难说任何具体的东西，所以我会给你一些一般性的建议：学习使用你的调试器（可能是 gdb），并尝试重现失败在调试器下。如果幸运的话，在调试器下仍然会出现段错误，您将获得一个堆栈跟踪，显示它失败的位置，这将为您提供一个起点，让您可以回到问题的真正根源。

如果你不走运，如果你使用调试支持进行编译，问题可能会消失，或者在 gdb 下运行它。在这种情况下，您将不得不求助于代码检查，并擦洗您的任何未定义行为的代码（例如，野指针或未初始化的指针，如比利奥尼尔建议）。

【讨论】：

【解决方案4】：

设置ulimit -c unlimited。

运行你的程序，让它崩溃。它现在应该进行核心转储。

运行gdb <program-name> core

如果您使用bt（回溯）命令，它应该可以让您很好地了解崩溃发生的位置。这应该可以帮助您修复它。

【讨论】：