python中函数回调中的JSON序列化和反序列化答案

【问题标题】：JSON serialization and deserialization within a function callback in pythonpython中函数回调中的JSON序列化和反序列化
【发布时间】：2018-04-19 16:46:49
【问题描述】：

例如，假设我有如下代码：

def function_callback(cb):
    cb()

def rand_name_giving_func(i):
    list_test = ['john', 'jim', 'anna', 'cynthia', 'dwight']
    return list_test[i] # not that random

def rand_value(i):
    dict_test = {'0': 'random_string', '1': 'random_string', '2': 'random_string'}
    return dict_test[str(i)]

def example():
    data = {} 

    for i in range(3):
        data['name_' + str(i)] = rand_name_giving_func(i)
        data['value_' + str(i)] = rand_value(i)

    if os.path.isfile('file.json') == True:
        with open('file.json', 'r') as fp:
            temp = json.load(fp)
            temp.update(data)

        with open('file.json', 'w') as fp:
            json.dump(temp, fp, indent=4, sort_keys=True)
    else:
        with open('file.json', 'w') as fp:
            json.dump(data, fp, indent=4, sort_keys=True)

if __name__ == '__main__':
    for i in range(10000):
        function_callback(example)

假设我只能处理 example() 内的 JSON 文件，并且回调会发生多次。据我了解，同一文件不会发生多个json.dump() 调用，因此我发现如果我反序列化文件，更新生成的字典并再次序列化（尽管效率极低），它可以工作。它没有，所以我得到了如下错误：

Traceback (most recent call last):
File "/home/pxcel/example.py", line 90, in function_callback
    temp = json.load(fp)
File "/usr/lib/python2.7/json/__init__.py", line 291, in load
    **kw)
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 380, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: end is out of bounds

还有ValueError: No JSON object could be decoded， ValueError: Extra data: line，所有关于temp = json.load(fp)。

我已经搜索过处理 JSON（ijson、demjson 等）的替代模块，但是我还没有找到一种有用的方法来使用它们来解决上述问题。假设 JSON 文件结构如下所示：

{
    "name_0": "john",
    "value_0": {
         "0": "random_string"
    }
}

有什么想法吗？假设 list_test 和 dict_test 各有 100k 个值，并且回调发生 10k 次。 JSON 编码/解码会起作用吗？

【问题讨论】：

这是您代码中的错字吗？ temp. = json.load(fp)。当您在with 语句中进行open() 调用时，您是否尝试过printing temp 的内容？
请将您的程序缩减为能够显示错误的最短的完整程序。请edit您的问题并将简短的完整程序复制粘贴到您的问题中。您的读者应该能够将其从Stack Overflow 复制粘贴到文本文件中并运行它。请参阅minimal reproducible example 了解更多信息。
@user8212173 是的，这是一个错字。不，我没有，是否有可能转换为字典有问题？
它通常不是，但是通过打印每一行并使用JsonLint 验证 json 来确保 json 数据可以被 python 读取。有时，您的 json 数据中可能存在可能导致编码/解码错误的空格。您问题中的 JSON 示例是有效的，我无法重现此错误。
另外，还不清楚rand_name_giving_func() 和rand_value() 的作用。当这些值被更新到临时字典时，很可能会出现错误。

标签： python json callback

【解决方案1】：

file.json 包含示例 json 数据：

    {
    "name_0": "john",
    "value_0": {
         "0": "random_string"
    }
    }

由于不太清楚预期的输出，我修改了这两个函数，以便更新data。但是，每次使用w 文件模式时，该文件都会被覆盖。如果这是您所期望的，我已在底部添加了输出。

def rand_name_giving_func(i):
    list_test = ['john', 'jim', 'anna', 'cynthia', 'dwight']
    return list_test[i] # not that random

def rand_value(i):
    dict_test = {'0': 'random_string', '1': 'random_string', '2': 'random_string'}
    return dict_test[str(i)]

def example():
    data = {} 

    for i in range(3):
        data['name_' + str(i)] = rand_name_giving_func(i)
        data['value_' + str(i)] = rand_value(i)

    if os.path.isfile('file.json') == True:
        with open('file.json', 'r') as fp:
            temp = json.load(fp)
            temp.update(data)

        with open('file.json', 'w') as fp:
            json.dump(temp, fp, indent=4, sort_keys=True)
    else:
        with open('file.json', 'w') as fp:
            json.dump(data, fp, indent=4, sort_keys=True)

example()  

#file.json:

{
    "name_0": "john",
    "name_1": "jim",
    "name_2": "anna",
    "value_0": "random_string",
    "value_1": "random_string",
    "value_2": "random_string"
}

除非您直接调用example 函数，否则可能不需要回调函数。这是一个假设的代码，它进行 10k 次调用并产生类似的输出，尽管是随机的。

#everything remains same
l = []
def random_name_generator():
    for i in range(1000):
        test = names.get_first_name()
        l.append(test)
    return l

def rand_name_giving_func(i):
    random_name_generator()
    list_test = [i for i in l]
    return list_test[i] # not that random

def rand_value():
    dict_test = {}
    for i in range(10000):
        dict_test[i] = i
    return dict_test[i]

def example():
    data = {} 

    for i in range(10000):
        data['name_' + str(i)] = rand_name_giving_func(i)
        data['value_' + str(i)] = rand_value()

#everything else remains the same

#Output: file.json contains about 20k entries:
{
"name_0": "Maria",
"name_1": "Carmen",
"name_10": "Antoinette",
"name_11": "Veronica",
"name_12": "Richard",
"name_13": "Rebecca",
"name_14": "Thomas",
"name_15": "Phillip",
"name_16": "Christopher",
.
.
.
"value_9995": 9999,
"value_9996": 9999,
"value_9997": 9999,
"value_9998": 9999,
"value_9999": 9999
}

【讨论】：

是的，这是我所期待的，但假设 list_test 和 dict_test 各有 100k 个值，并且回调发生 10k 次。 JSON 编码/解码会起作用吗？
这完全是另一个问题。但是我看不出它不起作用的任何原因，考虑到您正在以w 模式覆盖文件并且没有将数据截断为文件中现有的 json 数据。当然，只有在场景经过测试后才能肯定地断言这一点。如果此答案有帮助，您可以接受或投票。同时，出于好奇，我会尝试一个测试用例。
非常感谢。我会进行编辑，直到问题尽可能清楚为止。如果您认为该问题符合标准，请考虑为该问题点赞，以便更多人看到。
已通过 10k 次电话提出问题并编辑了答案。我还建议使用threading 在不同线程之间分配调用以加快执行速度。