【发布时间】:2020-08-12 23:37:34
【问题描述】:
我启动了一个工作进程池并提交了一堆任务。系统内存不足,oomkiller 杀死了一个工作进程。父进程只是挂在那里等待任务完成并且从未返回。
这是一个重现问题的可运行示例。我没有等待 oomkiller 杀死一个工作进程,而是找到所有工作进程的进程 ID 并告诉第一个任务杀死该进程。 (对ps 的调用不适用于所有操作系统。)
import os
import signal
from multiprocessing import Pool
from random import choice
from subprocess import run, PIPE
from time import sleep
def run_task(task):
target_process_id, n = task
print(f'Processing item {n} in process {os.getpid()}.')
delay = n + 1
sleep(delay)
if n == 0:
print(f'Item {n} killing process {target_process_id}.')
os.kill(target_process_id, signal.SIGKILL)
else:
print(f'Item {n} finished.')
return n, delay
def main():
print('Starting.')
pool = Pool()
ps_output = run(['ps', '-opid', '--no-headers', '--ppid', str(os.getpid())],
stdout=PIPE, encoding='utf8')
child_process_ids = [int(line) for line in ps_output.stdout.splitlines()]
target_process_id = choice(child_process_ids[1:-1])
tasks = ((target_process_id, i) for i in range(10))
for n, delay in pool.imap_unordered(run_task, tasks):
print(f'Received {delay} from item {n}.')
print('Closing.')
pool.close()
pool.join()
print('Done.')
if __name__ == '__main__':
main()
当我在一个有 8 个 CPU 的系统上运行它时,我看到了这个输出:
Starting.
Processing item 0 in process 303.
Processing item 1 in process 304.
Processing item 2 in process 305.
Processing item 3 in process 306.
Processing item 4 in process 307.
Processing item 5 in process 308.
Processing item 6 in process 309.
Processing item 7 in process 310.
Item 0 killing process 308.
Processing item 8 in process 303.
Received 1 from item 0.
Processing item 9 in process 315.
Item 1 finished.
Received 2 from item 1.
Item 2 finished.
Received 3 from item 2.
Item 3 finished.
Received 4 from item 3.
Item 4 finished.
Received 5 from item 4.
Item 6 finished.
Received 7 from item 6.
Item 7 finished.
Received 8 from item 7.
Item 8 finished.
Received 9 from item 8.
Item 9 finished.
Received 10 from item 9.
您可以看到项目 5 永远不会返回,并且池只是永远等待。
当子进程被杀死时,如何让父进程注意到?
【问题讨论】: