Python 多处理 - ApplyResult.get() NameError: global name 'self' is not defined答案

【问题标题】：Python Multiprocessing - ApplyResult.get() NameError: global name 'self' is not definedPython 多处理 - ApplyResult.get() NameError: global name 'self' is not defined
【发布时间】：2017-11-15 12:34:27
【问题描述】：

我目前正在尝试使用 Python 多处理包来使 CPU 密集型进程运行得更快。我有一个非常大的 numpy 矩阵，并且想使用 Pool 和 apply_async 拆分工作来计算矩阵中的值。但是，当我对函数运行单元测试以测试它是否有效时，我收到错误“NameError：未定义全局名称'self'”。我在 Google 或 StackOverflow 上也找不到任何有用的东西。知道为什么会发生这种情况吗？

Pytest 输出：

_____________________ TestBuildEMMatrix.test_build_em_matrix_simple _____________________

self = <mixemt_master.mixemt2.preprocess_test.TestBuildEMMatrix testMethod=test_build_em_matrix_simple>

    def test_build_em_matrix_simple(self):
            reads = ["1:A,2:C", "1:T,2:C", "3:T,4:T", "2:A,4:T"]
            in_mat = preprocess.build_em_matrix(self.ref, self.phy,
>                                                                                   reads, self.haps, self.args)

preprocess_test.py:272:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
preprocess.py:239: in build_em_matrix
    results[i] = results[i].get()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.pool.ApplyResult object at 0x7f4218ea07d0>, timeout = None

    def get(self, timeout=None):
        self.wait(timeout)
        if not self._ready:
            raise TimeoutError
        if self._success:
            return self._value
        else:
>           raise self._value
E           NameError: global name 'self' is not defined

/vol/hpc/apps/python-anaconda2-4.3.1-abat/install/lib/python2.7/multiprocessing/pool.py:567: NameError
--------------------------------- Captured stdout call ----------------------------------
False

以及相关的Python函数：

def build_em_matrix_process(markers, haplogroups, pos_obs, mut_prob, column_length, start_index, end_index):

    columns = [[prob_for_vars(markers, haplogroups[j], pos_obs, mut_prob) for j in xrange(column_length)]
        for i in xrange(start_index, end_index)]

    return columns

def build_em_matrix(refseq, phylo, reads, haplogroups, args):   
    """
    Returns the matrix that describes the probabiliy of each read
    originating in each haplotype.
    """
    hvb_mat = HapVarBaseMatrix(refseq, phylo)
    read_hap_mat = numpy.empty((len(reads), len(haplogroups)))

    if args.verbose:
        sys.stderr.write('Building EM input matrix...\n')

    num_processors = args.p

    pool = Pool(processes = num_processors);
    results = []
    partition_size = int(math.ceil(len(reads) / float(num_processors)))

    for i in xrange(num_processors):
        start_index = i * partition_size
        end_index = (i + 1) * partition_size
        pos_obs = pos_obs_from_sig(reads[i])

        results.append(pool.apply_async(build_em_matrix_process, (hvb_mat.markers, haplogroups, pos_obs, hvb_mat.mut_prob, len(haplogroups), start_index, end_index)))

    column = 0
    for i in xrange(num_processors):
        results[i].wait()
        print results[i].successful()
        results[i] = results[i].get()
        for j in xrange[len(results)]:
            read_hap_mat[column] = results[i][j]
            column += 1

    if args.verbose:
        sys.stderr.write('Done.\n\n')

    return read_hap_mat

在调用 'results[i].wait()] 后添加了一条语句'print results[I].successful()'，它将 False 打印到标准输出。我不确定为什么没有返回 true，因为我在 build_em_matrix_process 中找不到任何错误。

【问题讨论】：

单元测试代码在哪里？该错误表明TestBuildEMMatrix.test_build_em_matrix_simple 中存在问题，而不是正在测试的代码中。
单元测试代码没问题。这是一个现有的应用程序，我正在重构它以利用并行处理。之前的单元测试工作，我没有更改方法签名，一旦正确，方法的结果应该是一样的。

标签： python-2.7 numpy python-multiprocessing

【解决方案1】：

我又查了一下代码，找到了答案！

我将一个由 build_em_matrix_process 调用的类的实例方法重构为顶级方法来完成此任务。事实证明，我不小心在方法的主体中留下了对 self 的引用。当我运行测试时，错误似乎来自 ApplyResult.get() 本身的代码，而不是被调用的顶级方法中的代码。

【讨论】：