【问题标题】:Python multiprocessing claims too many open files when no files are even opened甚至没有打开文件时,Python 多处理声称打开的文件太多
【发布时间】:2016-08-04 10:55:13
【问题描述】:

我正在尝试加速使用巨大矩阵的算法。我已经将它并行化以对行进行操作,并将数据矩阵放在共享内存中,这样系统就不会被阻塞。然而,它并没有像我希望的那样顺利运行,而是在文件方面引发了一个奇怪的错误,我不理解,因为我什至没有在其中打开文件。

大致模拟程序中正在发生的事情,1000 次迭代也代表算法中发生的事情。

import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

def my_func(i, shared_array):
    shared_array[i,:] = i

def pool_init(_shared_array, _constans):
    global shared_array, constans
    shared_array = _shared_array
    constans = _constans

def pool_my_func(i):
    my_func(i, shared_array)

if __name__ == '__main__':
    for i in np.arange(1000):
        pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
        pool.map(pool_my_func, range(10))
    print(shared_array)

这会引发此错误(我在 OSX 上):

Traceback (most recent call last):
  File "weird.py", line 24, in <module>
    pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
    self._launch(process_obj)
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
    parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

我很困惑。我什至不在这里打开文件。我想要做的就是以不会阻塞系统内存的方式将shared_array 传递给各个进程,如果这会有所帮助,我什至不需要在并行化进程中对其进行修改。

另外,如果重要的话,正确代码本身引发的确切错误会有些不同:

Traceback (most recent call last):
  File "tcap.py", line 206, in <module>
  File "tcap.py", line 202, in main
  File "tcap.py", line 181, in tcap_cluster
  File "tcap.py", line 133, in ap_step
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
  File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
OSError: [Errno 24] Too many open files

所以是的,我不知道如何继续。任何帮助,将不胜感激。提前致谢!

【问题讨论】:

    标签: python ctypes python-multiprocessing


    【解决方案1】:

    您正在尝试创建 1000 个进程池,这些进程池未被回收(出于某种原因);这些已经消耗了主进程中所有可用的文件描述符,用于在主进程与其子进程之间进行通信的管道。

    也许你想使用:

    pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
    for _ in range(1000):
        pool.map(pool_my_func, range(10))
    

    【讨论】:

      【解决方案2】:

      来自操作系统的文件描述符数量有限制。我将我的 ulimit 从 1024 更改为 4096 并且它起作用了。

      使用以下方法检查您的描述符数量限制:

      ulimit -n
      

      对我来说是 1024,我将它更新为 4096,它工作正常。

      ulimit -n 4096
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2017-01-25
        • 2020-06-08
        • 2011-05-22
        • 2017-10-09
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多