【问题标题】:Intercept linux pthread_create function, leading to JVM/SSH crash拦截linux pthread_create函数,导致JVM/SSH崩溃
【发布时间】:2017-09-25 08:14:48
【问题描述】:

我尝试在ubuntu14.04上拦截pthread_create,代码是这样的:

struct thread_param{
    void * args;
    void *(*start_routine) (void *);
};

typedef int(*P_CREATE)(pthread_t *thread, const pthread_attr_t *attr,void *
    (*start_routine) (void *), void *arg);

void *intermedia(void * arg){

struct thread_param *temp;
temp=(struct thread_param *)arg;
//do some other things
return temp->start_routine(temp->args);
}

int  pthread_create(pthread_t  *thread,  const pthread_attr_t  *attr,  void  *
(*start_routine)(void  *),  void  *arg){
    static void *handle = NULL;
    static P_CREATE old_create=NULL;
    if( !handle )
    {
        handle = dlopen("libpthread.so.0", RTLD_LAZY);
        old_create = (P_CREATE)dlsym(handle, "pthread_create");
    }
    struct thread_param temp;
    temp.args=arg;
    temp.start_routine=start_routine;

    int result=old_create(thread,attr,intermedia,(void *)&temp);
//        int result=old_create(thread,attr,start_routine,arg);
    return result;
}

它适用于我自己的 pthread_create 测试用例(用 C 编写)。但是当我在 jvm 上将它与 hadoop 一起使用时,它给了我这样的错误报告:

Starting namenodes on [ubuntu]
ubuntu: starting namenode, logging to /home/yangyong/work/hadooptrace/hadoop-2.6.5/logs/hadoop-yangyong-namenode-ubuntu.out
ubuntu: starting datanode, logging to /home/yangyong/work/hadooptrace/hadoop-2.6.5/logs/hadoop-yangyong-datanode-ubuntu.out
ubuntu: /home/yangyong/work/hadooptrace/hadoop-2.6.5/sbin/hadoop-daemon.sh: line 131:  7545 Aborted                 (core dumped) nohup nice -n 
$HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null
Starting secondary namenodes [0.0.0.0
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=7585, tid=140445258151680
#
# JRE version: OpenJDK Runtime Environment (7.0_121) (build 1.7.0_121-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.121-b00 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 2.6.8
# Distribution: Ubuntu 14.04 LTS, package 7u121-2.6.8-1ubuntu0.14.04.1
# Problematic frame:
# C  0x0000000000000000
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/yangyong/work/hadooptrace/hadoop-2.6.5/hs_err_pid7585.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#]
A: ssh: Could not resolve hostname a: Name or service not known
#: ssh: Could not resolve hostname #: Name or service not known
fatal: ssh: Could not resolve hostname fatal: Name or service not known
been: ssh: Could not resolve hostname been: Name or service not known
#: ssh: Could not resolve hostname #: Name or service not known
#: ssh: Could not resolve hostname #: Name or service not known
#: ssh: Could not resolve hostname #: Name or service not known
^COpenJDK: ssh: Could not resolve hostname openjdk: Name or service not known
detected: ssh: Could not resolve hostname detected: Name or service not known
version:: ssh: Could not resolve hostname version:: Name or service not known
JRE: ssh: Could not resolve hostname jre: Name or service not known

我的代码有什么问题吗?还是因为 JVM 或 SSH 的保护机制之类的其他原因? 谢谢。

【问题讨论】:

  • 还有一个类似的错误例子:link

标签: c linux ssh jvm pthreads


【解决方案1】:

这段代码导致子线程有一个无效的arg值:

    struct thread_param temp;
    temp.args=arg;
    temp.start_routine=start_routine;

    int result=old_create(thread,attr,intermedia,(void *)&temp);
//        int result=old_create(thread,attr,start_routine,arg);
    return result;  // <-- temp and its contents are now invalid

temp 不能保证在新线程中再存在,因为对您的 pthread_create() 的父调用可能已返回,从而使其包含的值无效。

【讨论】:

    【解决方案2】:

    您的代码中有很多问题。我不知道是哪个(如果有的话)导致了您所看到的问题,但您绝对应该解决它们。

    首先,您可以打开核心转储(通常使用ulimit -c unlimited)并在 GDB 中加载核心。看看回溯指向什么。

    不要dlopen pthreads。相反,您应该可以使用dlsym(RTLD_NEXT, "pthread_create")

    然而,问题的最可能根源是您将原始参数存储在全局变量中。这意味着如果有人(例如,Java 运行时)同时打开大量线程,您将混淆哪个是做什么的。

    【讨论】:

    • 感谢您的回答。对于第一点,我对gdb调试不是很熟悉,后来我打开了它,但我仍然无法弄清楚问题所在。第二点,如果我只使用 dlsym(RTLD_NEXT, "pthread_create"),它会抛出警告并且 jvm 仍然会崩溃。第三点,我不太确定哪个变量是全局的。无论如何,感谢您的及时回复。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多