Google App Engine 中的 Memcache 1 MB 限制答案

【问题标题】：Memcache 1 MB limit in Google App EngineGoogle App Engine 中的 Memcache 1 MB 限制
【发布时间】：2011-07-02 04:14:28
【问题描述】：

如何在内存缓存中存储大于 1 MB 的对象？有没有办法将其拆分，但仍然可以使用相同的密钥访问数据？

【问题讨论】：

对象的性质是什么，为什么要尝试将其缓存在 memcache 中？

标签： google-app-engine memcached

【解决方案1】：

一个很好的解决方法是使用 layer_cache.py，这是可汗学院（开源）编写和使用的 Python 类。基本上，它是内存缓存（cachepy 模块）与 memcache 的组合，被用作通过实例同步内存缓存的一种方式。 find the source here 并阅读 Ben Kamens 的博客文章 here。

【讨论】：

layer_cache.py 似乎正在解决一个不同的问题，即从 memcache 传输数据的限制（以 GB/天为单位）。至少从您链接的 Ben Kamen 的博文中看来就是这种情况。

【解决方案2】：

我使用以下模块（“blobcache”）在 GAE 的内存缓存中存储大小大于 1Mb 的值。

import pickle
import random
from google.appengine.api import memcache


MEMCACHE_MAX_ITEM_SIZE = 900 * 1024


def delete(key):
  chunk_keys = memcache.get(key)
  if chunk_keys is None:
    return False
  chunk_keys.append(key)
  memcache.delete_multi(chunk_keys)
  return True


def set(key, value):
  pickled_value = pickle.dumps(value)

  # delete previous entity with the given key
  # in order to conserve available memcache space.
  delete(key)

  pickled_value_size = len(pickled_value)
  chunk_keys = []
  for pos in range(0, pickled_value_size, MEMCACHE_MAX_ITEM_SIZE):
    # TODO: use memcache.set_multi() for speedup, but don't forget
    # about batch operation size limit (32Mb currently).
    chunk = pickled_value[pos:pos + chunk_size]

    # the pos is used for reliable distinction between chunk keys.
    # the random suffix is used as a counter-measure for distinction
    # between different values, which can be simultaneously written
    # under the same key.
    chunk_key = '%s%d%d' % (key, pos, random.getrandbits(31))

    is_success = memcache.set(chunk_key, chunk)
    if not is_success:
      return False
    chunk_keys.append(chunk_key)
  return memcache.set(key, chunk_keys)


def get(key):
  chunk_keys = memcache.get(key)
  if chunk_keys is None:
    return None
  chunks = []
  for chunk_key in chunk_keys:
    # TODO: use memcache.get_multi() for speedup.
    # Don't forget about the batch operation size limit (currently 32Mb).
    chunk = memcache.get(chunk_key)
    if chunk is None:
      return None
    chunks.append(chunk)
  pickled_value = ''.join(chunks)
  try:
    return pickle.loads(pickled_value)
  except Exception:
    return None

【讨论】：

chunk_size 未定义，可能应该是 MEMCACHE_MAX_ITEM_SIZE。否则看起来像好的代码。它似乎对我有用！
有时 chunk_keys 值是作为字符串而不是列表从内存缓存中出来的。所以现在为了安全起见，我使用 join 和 split 将其转换为字符串。

【解决方案3】：

正如其他人所提到的，您可以同时从内存缓存中 add 和 retrieve 多个值。有趣的是，虽然应用程序引擎blog says 这些批量操作最多可以处理 32mb，但official documentation still says 它们被限制为 1mb。所以一定要测试一下，也许会缠着谷歌更新他们的文档。另外请记住，您的某些块可能会先于其他块从内存缓存中被逐出。

我建议在谷歌上搜索 python compress string 并考虑在将对象发送到内存缓存之前对其进行序列化和压缩。

您可能还想询问this guy what he means 是否有一个扩展程序允许他在内存缓存中存储更大的对象。

【讨论】：

【解决方案4】：

将大量数据存储到 memcache 中的最佳方法是将其拆分为块并使用 set_multi 和 get_multi 来有效地存储和检索数据。

但请注意，某些部分可能会从缓存中删除，而其他部分可能会保留。

您还可以通过将数据存储在全局变量中来缓存应用程序实例中的数据，但这不太理想，因为它不会跨实例共享并且更有可能消失。

GAE roadmap 支持从应用程序内上传到 blobstore，您可能需要留意这一点，以及与 Google Storage 的集成。

【讨论】：

【解决方案5】：

有 memcache 方法 set_multi 和 get_multi 将字典和前缀作为参数。

如果您可以将数据拆分为块字典，则可以使用它。基本上，前缀将成为您的新键名。

您必须以某种方式跟踪块的名称。此外，任何块都可能随时从 memcache 中逐出，因此您还需要某种方式来重构部分数据。

【讨论】：