使用joblib仅缓存类的某些方法的正确方法答案

【问题标题】：Correct way to cache only some methods of a class with joblib使用joblib仅缓存类的某些方法的正确方法
【发布时间】：2014-04-09 12:39:19
【问题描述】：

我正在编写一个类，该类具有一些计算量大的方法和一些用户希望迭代调整且独立于计算的参数。

实际使用是为了可视化，但这里是一个卡通例子：

class MyClass(object):

    def __init__(self, x, name, mem=None):

        self.x = x
        self.name = name
        if mem is not None:
            self.square = mem.cache(self.square)

    def square(self, x):
        """This is the 'computation heavy' method."""
        return x ** 2

    def report(self):
        """Use the results of the computation and a tweakable parameter."""
        print "Here you go, %s" % self.name
        return self.square(self.x)

基本思想是，用户可能希望使用相同的x 但不同的name 参数创建此类的许多实例。我想让用户提供一个joblib.Memory 对象来缓存计算部分，这样他们就可以“报告”许多不同的名称，而无需每次都重新计算平方数组。

（这有点奇怪，我知道。用户需要为每个名称使用不同的类实例的原因是他们实际上将与看起来像这样的接口函数进行交互。

def myfunc(x, name, mem=None):
    theclass = MyClass(x, name, mem)
    theclass.report()

但是让我们暂时忽略它）。

在joblib docs 之后，我正在使用self.square = mem.cache(self.square) 行缓存square 函数。问题是，因为 self 对于不同的实例会有所不同，所以即使参数相同，每次都会重新计算数组。

我猜测处理这个问题的正确方法是将行更改为

self.square = mem.cache(self.square, ignore=["self"])

但是，这种方法有什么缺点吗？有没有更好的方法来完成缓存？

【问题讨论】：

你能解决这个问题吗？还是我们只是按照文档进行操作？
现在我想一想，文档给出了通用方法，该方法必须考虑到调用square 可能会产生不同的结果即使使用相同的参数 MyClass 的实例。您描述的square 方法将是@staticmethod，因为看起来使用相同参数调用该方法不会改变结果。这可以通过使用 @staticmethod 注释并确保定义没有 self 作为参数来实现，例如@staticmethod #newline def square(x):

标签： python memoization joblib

【解决方案1】：

来自docs，

如果你想在类中使用缓存，推荐的模式是缓存一个纯函数并在你的类中使用缓存的函数。

由于您希望内存缓存是可选的，因此我建议您这样做：

def square_function(x):
    """This is the 'computation heavy' method."""
    print '    square_function is executing, not using cached result.'
    return x ** 2

class MyClass(object):

    def __init__(self, x, name, mem=None):
        self.x = x
        self.name = name
        if mem is not None:
            self._square_function = mem.cache(square_function)
        else:
            self._square_function = square_function

    def square(self, x):
        return self._square_function(x)

    def report(self):
        print "Here you go, %s" % self.name
        return self.square(self.x)


from tempfile import mkdtemp
cachedir = mkdtemp()

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)

objects = [
    MyClass(42, 'Alice (cache)', memory),
    MyClass(42, 'Bob (cache)', memory),
    MyClass(42, 'Charlie (no cache)')
]

for obj in objects:
    print obj.report()

执行收益率：

Here you go, Alice (cache)
    square_function is executing, not using cached result.
1764
Here you go, Bob (cache)
1764
Here you go, Charlie (no cache)
    square_function is executing, not using cached result.
1764

【讨论】：