僵尸进程 - bw_0927

http://kenby.iteye.com/blog/1174208

http://www.zyfforlinux.cc/2015/01/03/%E5%A4%9A%E8%BF%9B%E7%A8%8B%E7%BC%96%E7%A8%8B%E7%9F%A5%E8%AF%86%E7%82%B9%E5%A4%87%E5%BF%98/

避免僵尸进程的几种方法:

父进程使用wait系列函数进行回收
父进程注册SIGCHLD信号，在信号处理函数中调用wait来回收僵尸进程
父进程fork子进程后自己退出，此时子进程就托管给init进程，init进程负责子进程的状态回收
父进程中对SIGCHLD信号进行忽略即可

如何同时服务多个客户端呢？在未讲到select/poll/epoll等高级IO之前，比较老土的办法是使用fork来实现。网络服务器通常用fork来同时服务多个客户端，父进程专门负责监听端口，每次accept一个新的客户端连接就 fork出一个子进程专门服务这个客户端。但是子进程退出时会产生僵尸进程，父进程要注意处理SIGCHLD信号和调用wait清理僵尸进程，最简单的办法就是直接忽略SIGCHLD信号。

====================

原文地址: http://coolshell.cn/articles/656.html

可能很少有人意识到，在一个进程调用了exit之后，该进程并非马上就消失掉，而是留下一个称为僵尸进程（Zombie）的数据结构。在Linux进程的5种状态中，僵尸进程是非常特殊的一种，它已经放弃了几乎所有内存空间，没有任何可执行代码，也不能被调度，仅仅在进程列表中保留一个位置，记载该进程的退出状态等信息供其他进程收集，除此之外，僵尸进程不再占有任何内存空间。

僵尸进程的来由，要追溯到Unix，Unix的设计者们设计这个东西并非是因为闲来无事想装装酷什么的。上面说到，僵尸进程中保存着很多对程序员和系统管理员非常重要的信息，首先，这个进程是怎么死亡的？是正常退出呢，还是出现了错误，还是被其它进程强迫退出的？也就是说，这个程序的退出码是什么？其次，这个进程占用的总系统CPU时间和总用户CPU时间分别是多少？发生页错误的数目和收到信号的数目。这些信息都被存储在僵尸进程中，试想如果没有僵尸进程，进程执行多长我们并不知道，一旦其退出，所有与之相关的信息都立刻都从系统中清除，而如果此时父进程或系统管理员需要用到，就只好干瞪眼了。

所以，进程退出后，系统会把该进程的状态变成Zombie，然后给上一定的时间等着父进程来收集其退出信息，因为可能父进程正忙于别的事情来不及收集，所以，使用Zombie状态表示进程退出了，正在等待父进程收集信息中。

Zombie进程不可以用kill命令清楚，因为进程已退出，如果需要清除这样的进程，那么需要清除其父进程，或是等很长的时间后被内核清除。因为Zombie的进程还占着个进程ID号呢，这样的进程如果很多的话，不利于系统的进程调度。

下面，让我们来看看一个示例：

/* zombie.c */
#include <sys/types.h>
#include <unistd.h>  main()
{

    pid_t pid; 

    pid=fork();

    if(pid<0) { /* 如果出错 */ 

        printf("error occurred!\n");

    }else if(pid==0){ /* 如果是子进程 */ 

        exit(0);

    }else{  /* 如果是父进程 */ 

        sleep(60);  /* 休眠60秒 */ 

        wait(NULL); /* 收集僵尸进程 */

    }
}

编译这个程序：

$ cc zombie.c -o zombie

后台运行程序，以使我们能够执行下一条命令

$ ./zombie &
[1] 1217

列一下系统内的进程

$ ps -ax
... ...

1137   pts/0   S   0:00   -bash

1217   pts/0   S   0:00   ./zombie

1218   pts/0   Z   0:00   [zombie]

1578   pts/0   R   0:00   ps   -ax

其中的”Z”就是僵尸进程的标志，它表示1218号进程现在就是一个僵尸进程。

收集Zombie进程的信息，并终结这些僵尸进程，需要我们在父进程中使用waitpid调用和wait调用。这两者的作用都是收集僵尸进程留下的信息，同时使这个进程彻底消失。

个人总结:

1 僵尸进程的用途

存储进程占用的总系统CPU时间和总用户CPU时间分别是多少？

发生页错误的数目和收到信号的数目.进程退出的状态,

供回收者查看

2 什么是僵尸进程

僵尸进程不是真正的进程, 它是一种状态, 进程一退出就成为了僵尸进程,

只有三种方法才能回收僵尸进程

(1) 父进程显示地调用wait和waitpid回收之

(2) 杀死父进程, 僵尸进程就会过继给 init, init 始终会负责清理僵尸进程

(3) 重启系统

所以, 如果父进程不调用 wait 和 waitpid 回收僵尸进程, 在父进程活着的情况下, 僵尸进程将一直存在下去

注意, 不能通过 kill 发信号退出, 因为僵尸进程已退出, 无法接受信号

3 僵尸进程的危害

这样的进程如果很多的话，塞满process table, 不利于系统的进程调度

4 区别僵尸进程与孤儿进程

如果父进程先于子进程退出, 子进程就成为了孤儿进程, 没有父亲的孩子

统统都送到孤儿院 init 进程领养.

孤儿进程的特点是:　父亲死了, 我还活着, 我是孤儿

僵尸进程的特点是: 我死了, 父亲还活着, 但它不给我收尸

5 如何防止僵尸进程的产生

(1) 阻塞方式: 父进程显示的调用 wait 或 waitpid 回收僵尸进程

(2) 异步方式: 父进程捕捉 SIGCHLD 信号, 然后调用 wait 或 waitpid 回收僵尸进程

6 正确地回收僵尸进程。

如果父进程很忙，就采用异步回收的方式，捕捉 SIGCHLD 信号，其处理函数为：

C代码

void sig_chld(int signo)
{
pid_t pid;
int stat;
pid = wait(&stat);
printf("child %d terminated\n", pid);
}

这里存在一个问题，如果多个子进程同时退出，同时产生SIGCHLD信号，

而SIGCHLD是不可靠信号，不支持排队。所以可能只会捕捉到一次，在这种

情况下，只有一个子进程被回收，其它子进程将变成僵尸进程。

解决办法是收到SIGCHLD信号后，不是只回收一个子进程，而是回收所有退出了的子进程。

考虑在信号处理函数内循环调用wait，当没有子进程退出时，wait会一直阻塞，信号处理函数

将无法返回。我们需要的是 waitpid，通过设置 WNOHANG 选项，在没有子进程退出时，

waitpid 返回 -1，这时退出循环，信号处理函数返回。

C代码

void sig_chld(int signo)
{
pid_t pid;
int stat;
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0) {
printf("child %d terminated\n", pid);
}
}

=========摘自 APUE=8.5 exit function======

When we described the fork function, it was obvious that the child has a parent process after the call to fork. Now we\'re talking about returning a termination status to the parent. But what happens if the parent terminates before the child? The answer is that the init process becomes the parent process of any process whose parent terminates. We say that the process has been inherited by init. What normally happens is that whenever a process terminates, the kernel goes through all active processes to see whether the terminating process is the parent of any process that still exists. If so, the parent process ID of the surviving process is changed to be 1 (the process ID of init). This way, we\'re guaranteed that every process has a parent.

Another condition we have to worry about is when a child terminates before its parent. If the child completely disappeared, the parent wouldn\'t be able to fetch its termination status when and if the parent were finally ready to check if the child had terminated. The kernel keeps a small amount of information for every terminating process, so that the information is available when the parent of the terminating process calls wait or waitpid. Minimally, this information consists of the process ID, the termination status of the process, and the amount of CPU time taken by the process. The kernel can discard all the memory used by the process and close its open files. In UNIX System terminology, a process that has terminated, but whose parent has not yet waited for it, is called a zombie. The ps(1) command prints the state of a zombie process as Z. If we write a long-running program that forks many child processes, they become zombies unless we wait for them and fetch their termination status.