Python - 附加到腌制列表答案

【问题标题】：Python - appending to a pickled listPython - 附加到腌制列表
【发布时间】：2015-03-20 14:00:30
【问题描述】：

我正在努力在腌制文件中附加一个列表。这是代码：

#saving high scores to a pickled file

import pickle

first_name = input("Please enter your name:")
score = input("Please enter your score:")

scores = []
high_scores = first_name, score
scores.append(high_scores)

file = open("high_scores.dat", "ab")
pickle.dump(scores, file)
file.close()

file = open("high_scores.dat", "rb")
scores = pickle.load(file)
print(scores)
file.close()

我第一次运行代码时，它会打印名称和分数。

我第二次运行代码时，它会打印出 2 个名字和 2 个分数。

我第三次运行代码时，它会打印第一个名字和分数，但它会用我输入的第三个名字和分数覆盖第二个名字和分数。我只是希望它继续添加名称和分数。我不明白为什么它会保存第一个名字并覆盖第二个名字。

【问题讨论】：

标签： python append pickle

【解决方案1】：

如果您想写入和读取腌制文件，您可以为列表中的每个条目多次调用 dump。每次转储时，都会将分数附加到腌制文件中，每次加载时都会读取下一个分数。

>>> import pickle as dill
>>> 
>>> scores = [('joe', 1), ('bill', 2), ('betty', 100)]
>>> nscores = len(scores)
>>> 
>>> with open('high.pkl', 'ab') as f:
…   _ = [dill.dump(score, f) for score in scores]
... 
>>> 
>>> with open('high.pkl', 'ab') as f:
...   dill.dump(('mary', 1000), f)
... 
>>> # we added a score on the fly, so load nscores+1
>>> with open('high.pkl', 'rb') as f:
...     _scores = [dill.load(f) for i in range(nscores + 1)]
... 
>>> _scores
[('joe', 1), ('bill', 2), ('betty', 100), ('mary', 1000)]
>>>

您的代码最有可能失败的原因是您将原始的scores 替换为未腌制的分数列表。因此，如果添加了任何新的分数，您会在记忆中将它们吹走。

>>> scores
[('joe', 1), ('bill', 2), ('betty', 100)]
>>> f = open('high.pkl', 'wb')
>>> dill.dump(scores, f)
>>> f.close()
>>> 
>>> scores.append(('mary',1000))
>>> scores
[('joe', 1), ('bill', 2), ('betty', 100), ('mary', 1000)]
>>> 
>>> f = open('high.pkl', 'rb')
>>> _scores = dill.load(f)
>>> f.close()
>>> _scores
[('joe', 1), ('bill', 2), ('betty', 100)]
>>> blow away the old scores list, by pointing to _scores
>>> scores = _scores
>>> scores
[('joe', 1), ('bill', 2), ('betty', 100)]

因此，scores 的 Python 名称引用问题比 pickle 问题更多。 Pickle 只是实例化一个新列表并将其称为 scores（在您的情况下），然后它会垃圾收集 scores 在此之前指向的任何内容。

>>> scores = 1
>>> f = open('high.pkl', 'rb')
>>> scores = dill.load(f)
>>> f.close()
>>> scores
[('joe', 1), ('bill', 2), ('betty', 100)]

【讨论】：

也感谢您的意见。 dogwynn 的解决方案效果很好，但会采纳你所说的。
为什么这里有泡菜莳萝？
@DheerajMPai：因为我使用dill 而不是pickle，所以这就是我用于解决方案的内容。 pickle 也可以，所以我没有编辑我的代码，而是更改了导入。这一切都在挑选...dill 更擅长于此。

【解决方案2】：

您需要先从数据库（即您的 pickle 文件）中提取列表，然后再附加到它。

import pickle
import os

high_scores_filename = 'high_scores.dat'

scores = []

# first time you run this, "high_scores.dat" won't exist
#   so we need to check for its existence before we load 
#   our "database"
if os.path.exists(high_scores_filename):
    # "with" statements are very handy for opening files. 
    with open(high_scores_filename,'rb') as rfp: 
        scores = pickle.load(rfp)
    # Notice that there's no "rfp.close()"
    #   ... the "with" clause calls close() automatically! 

first_name = input("Please enter your name:")
score = input("Please enter your score:")

high_scores = first_name, score
scores.append(high_scores)

# Now we "sync" our database
with open(high_scores_filename,'wb') as wfp:
    pickle.dump(scores, wfp)

# Re-load our database
with open(high_scores_filename,'rb') as rfp:
    scores = pickle.load(rfp)

print(scores)

【讨论】：

也感谢您为代码提供的额外解释。
@dogwynn：如果只是加载到新变量，为什么还要检查文件是否存在，避免真正的问题？
@MikeMcKerns：我想我不知道如何从不存在的文件中调用 pickle.load。我知道您可以使用 DBv2 API 模块（例如搁置）来执行此操作，但我不知道可以在“创建”模式下（无需创建）打开的类似文件的对象（pickle.load 需要）。否则，我的目标是拥有一个可以从命令行重复执行的脚本，每次添加一个新的 (name,score) 元组。
@Charlie：我很高兴。很高兴在书中看到另一个 Python 黑客。 :-)
@dogwynn: mode ab 而不是 wb 如果文件不存在则创建一个文件，如果存在则追加。此外，我并不是建议您尝试从不存在的文件中尝试 load - 只是指出 OP 的问题更普遍的是取消选择与现有列表同名的对象 - 从而破坏在dump 之后对现有列表进行的任何编辑。

【解决方案3】：

实际上并没有回答这个问题，但是如果有人想一次将单个项目添加到泡菜中，您可以通过...

import pickle
import os

high_scores_filename = '/home/ubuntu-dev/Desktop/delete/high_scores.dat'

scores = []

# first time you run this, "high_scores.dat" won't exist
#   so we need to check for its existence before we load
#   our "database"
if os.path.exists(high_scores_filename):
    # "with" statements are very handy for opening files.
    with open(high_scores_filename,'rb') as rfp:
        scores = pickle.load(rfp)
    # Notice that there's no "rfp.close()"
    #   ... the "with" clause calls close() automatically!

names = ["mike", "bob", "joe"]

for name in names:
    high_score = name
    print(name)
    scores.append(high_score)

# Now we "sync" our database
with open(high_scores_filename,'wb') as wfp:
    pickle.dump(scores, wfp)

# Re-load our database
with open(high_scores_filename,'rb') as rfp:
    scores = pickle.load(rfp)

print(scores)

【讨论】：

如帖子前5个字所表示。

【解决方案4】：

不要使用 pickle，而是使用 h5py，这也可以解决您的目的

with h5py.File('.\PreprocessedData.h5', 'a') as hf:
    hf["X_train"].resize((hf["X_train"].shape[0] + X_train_data.shape[0]), axis = 0)
    hf["X_train"][-X_train_data.shape[0]:] = X_train_data

    hf["X_test"].resize((hf["X_test"].shape[0] + X_test_data.shape[0]), axis = 0)
    hf["X_test"][-X_test_data.shape[0]:] = X_test_data


    hf["Y_train"].resize((hf["Y_train"].shape[0] + Y_train_data.shape[0]), axis = 0)
    hf["Y_train"][-Y_train_data.shape[0]:] = Y_train_data

    hf["Y_test"].resize((hf["Y_test"].shape[0] + Y_test_data.shape[0]), axis = 0)
    hf["Y_test"][-Y_test_data.shape[0]:] = Y_test_data

source

【讨论】：

为什么不改用hickle 或klepto？它们都旨在为您提供简单的 dump 和 load 与 HDF5 的 pickle 等效语法。如果您将我的回答中的dill 替换为hickle，我相信它应该可以工作，并存储为HDF5。
是的，它有效。谢谢（你的）信息。我不知道 hickle。