【问题标题】:How are blobs removed in RelStorage pack?如何在 RelStorage 包中删除 blob?
【发布时间】:2020-12-16 04:34:28
【问题描述】:

此问题与How to pack blobstorage with Plone and RelStorage有关

使用zodb 数据库和RelStorage 和sqlite 作为其后端,我正在尝试删除未使用的blob。目前db.pack 不会从光盘中删除 blob。下面的最小工作示例演示了这种行为:

import logging
import numpy as np
import os
import persistent
from persistent.list import PersistentList
import shutil
import time
from ZODB import config, blob

connectionString = """
%import relstorage
<zodb main>
<relstorage>
blob-dir ./blob
keep-history false
cache-local-mb 0
<sqlite3>
    data-dir .
</sqlite3>
</relstorage>
</zodb>
"""


class Data(persistent.Persistent):
    def __init__(self, data):
        super().__init__()

        self.children = PersistentList()

        self.data = blob.Blob()
        with self.data.open("w") as f:
            np.save(f, data)


def main():
    logging.basicConfig(level=logging.INFO)
    # Initial cleanup
    for f in os.listdir("."):
        if f.endswith("sqlite3"):
            os.remove(f)

    if os.path.exists("blob"):
        shutil.rmtree("blob", True)

    # Initializing database
    db = config.databaseFromString(connectionString)
    with db.transaction() as conn:
        root = Data(np.arange(10))
        conn.root.Root = root

        child = Data(np.arange(10))
        root.children.append(child)

    # Removing child reference from root
    with db.transaction() as conn:
        conn.root.Root.children.pop()

    db.close()

    print("blob directory:", [[os.path.join(rootDir, f) for f in files] for rootDir, _, files in os.walk("blob") if files])
    db = config.databaseFromString(connectionString)
    db.pack(time.time() + 1)
    db.close()
    print("blob directory:", [[os.path.join(rootDir, f) for f in files] for rootDir, _, files in os.walk("blob") if files])


if __name__ == "__main__":
    main()

上面的例子做了以下事情:

  1. 删除当前目录中任何以前的数据库以及 blob 目录。
  2. 从头开始创建一个数据库/存储,添加两个对象(root 和 child),而 child 由 root 引用并执行事务。
  3. 删除从根到子的链接并执行事务。
  4. 关闭数据库/存储
  5. 打开数据库/存储并在未来执行db.pack 一秒钟。

最小工作示例的输出如下:

INFO:ZODB.blob:(23376) Blob directory '<some path>/blob/' does not exist. Created new directory.
INFO:ZODB.blob:(23376) Blob temporary directory './blob/tmp' does not exist. Created new directory.
blob directory: [['blob/.layout'], ['blob/3/.lock', 'blob/3/0.03da352c4c5d8877.blob'], ['blob/6/.lock', 'blob/6/0.03da352c4c5d8877.blob']]
INFO:relstorage.storage.pack:pack: beginning pre-pack
INFO:relstorage.storage.pack:Analyzing transactions committed Thu Aug 27 11:48:17 2020 or before (TID 277592791412927078)
INFO:relstorage.adapters.packundo:pre_pack: filling the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: Filled the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: analyzing references from 7 object(s) (memory delta: 256.00 KB)
INFO:relstorage.adapters.packundo:pre_pack: objects analyzed: 7/7
INFO:relstorage.adapters.packundo:pre_pack: downloading pack_object and object_ref.
INFO:relstorage.adapters.packundo:pre_pack: traversing the object graph to find reachable objects.
INFO:relstorage.adapters.packundo:pre_pack: marking objects reachable: 4
INFO:relstorage.adapters.packundo:pre_pack: finished successfully
INFO:relstorage.storage.pack:pack: pre-pack complete
INFO:relstorage.adapters.packundo:pack: will remove 3 object(s)
INFO:relstorage.adapters.packundo:pack: cleaning up
INFO:relstorage.adapters.packundo:pack: finished successfully
blob directory: [['blob/.layout'], ['blob/3/.lock', 'blob/3/0.03da352c4c5d8877.blob'], ['blob/6/.lock', 'blob/6/0.03da352c4c5d8877.blob']]

如您所见,db.pack 确实删除了 3 个对象“将删除 3 个对象”,但文件系统中的 blob 没有改变。

在 RelStorage 的单元测试中,它们似乎确实测试了是否从文件系统中删除了 blob (see here),但在上面的脚本中它不起作用。

我做错了什么?任何提示/链接/帮助表示赞赏。

【问题讨论】:

  • 我的回答完全偏离主题,但是:,过去的爆炸! RelStorage 项目是在我工作的一家核心 Plone 生态系统公司 Jarn 的最大客户要求所有数据存储在 Oracle 中时构思的。没有如果或但是!因此,我们委托 Shane Hathaway 根据他的 PGStorage 工作创建这个项目,您会发现我contributed a fair amount of work大约十年前。很高兴看到该项目仍在进行中!

标签: python sqlite blob zodb relstorage


【解决方案1】:

默认情况下,blob存储目录用作缓存,存储同样存储在数据库中的blob数据;这个想法是从本地磁盘缓存加载 blob 数据比从远程数据库服务器加载更快。使用缓存 blob 存储打包在无历史记录的存储中不会删除无法访问的 blob 文件,而是在需要腾出空间时依靠文件大小限制器来驱逐陈旧的缓存数据。但是,您没有设置大小限制,因此缓存会无限增长,而那些无法访问的 blob 文件将永远存在。

此处打包无法删除 blob 文件,因为缓存对于每个 ZODB 客户端都是本地的;可以说,它不在 ZODB 存储的管辖范围内。这在使用 SQLite 作为数据库层时可能不那么明显,但想象一下在单独的服务器上使用 Postgres,在不同的计算机上有多个客户端,您可以看到打包时缓存清理是不可行的。

请注意,另一个 blob 存储选项是共享 blob 存储,它可能更接近您的预期:所有 blob 数据都存储在磁盘上,不在数据库中。当与远程数据库服务器和多个客户端一起使用时,您需要将其放置在 NTFS 共享之类的东西上。在这种情况下,打包直接对 blob 进行操作,并且在打包时会立即删除无法访问的 blob 文件。

你有两个选择:

  • 通过设置blob-cache-size 设置 Blob 缓存的大小限制。打包仍然不会删除 blob 文件,但会在空间不足时删除。

  • 切换到共享 blob 缓存(将 shared-blob-dir 设置为 true)。对于 sqlite 支持的 RelStorage,这可能比缓存 blob 存储更有意义,尽管文档中有可怕的警告!

所以最简单的改变是切换 blob 存储模式:

connectionString = """
%import relstorage
<zodb main>
<relstorage>
blob-dir ./blob
shared-blob-dir true
keep-history false
cache-local-mb 0
<sqlite3>
    data-dir .
</sqlite3>
</relstorage>
</zodb>
"""

然后输出变为:

INFO:ZODB.blob:(26177) Blob directory '<some path>/blob/' does not exist. Created new directory.
INFO:ZODB.blob:(26177) Blob temporary directory './blob/tmp' does not exist. Created new directory.
blob directory: [['blob/.layout'], ['blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/0x03da4f169582cd22.blob', 'blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/.lock'], ['blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x06/0x03da4f169582cd22.blob', 'blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x06/.lock']]
INFO:relstorage.storage.pack:pack: beginning pre-pack
INFO:relstorage.storage.pack:Analyzing transactions committed Tue Sep  1 01:22:35 2020 or before (TID 277621285453417864)
INFO:relstorage.adapters.packundo:pre_pack: filling the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: Filled the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: analyzing references from 7 object(s) (memory delta: 0 KB)
INFO:relstorage.adapters.packundo:pre_pack: objects analyzed: 7/7
INFO:relstorage.adapters.packundo:pre_pack: downloading pack_object and object_ref.
INFO:relstorage.adapters.packundo:pre_pack: traversing the object graph to find reachable objects.
INFO:relstorage.adapters.packundo:pre_pack: marking objects reachable: 4
INFO:relstorage.adapters.packundo:pre_pack: finished successfully
INFO:relstorage.storage.pack:pack: pre-pack complete
INFO:relstorage.adapters.packundo:pack: will remove 3 object(s)
INFO:relstorage.adapters.packundo:pack: cleaning up
INFO:relstorage.adapters.packundo:pack: finished successfully
blob directory: [['blob/.layout'], ['blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/0x03da4f169582cd22.blob', 'blob/0x00/0x00/0x00/0x00/0x00/0x00/0x00/0x03/.lock']]

是的,blob 目录布局发生了变化,因此它可以处理所有可能的 OID。然而,OID 6 已被删除。

你找到的单元测试只在testing with a shared blob cache时运行:

# If the blob directory is a cache, don't test packing,
# since packing can not remove blobs from all caches.
test_packing = shared_blob_dir

【讨论】:

    猜你喜欢
    • 2021-01-22
    • 2020-05-25
    • 2022-01-13
    • 2021-12-07
    • 1970-01-01
    • 2016-09-01
    • 2012-04-26
    • 2012-06-13
    • 2020-04-09
    相关资源
    最近更新 更多