提高python中重复代码的效率答案

【问题标题】：Improving efficiency of repetitive code in python提高python中重复代码的效率
【发布时间】：2018-05-31 12:28:29
【问题描述】：

我需要在python中做如下操作：

我有一个元组列表

data = [("John", 14, 12132.213, "Y", 34), ("Andrew", 23, 2121.21, "N", 66)]

我有一个字段列表：

fields = ["name", "age", "vol", "status", "limit"]

每个元组的数据按顺序对应每个字段。

我有一个字典

desc = { "name" : "string", "age" : "int", "vol" : "double", "status" : "byte", "limit" : "int" }

我需要生成一条以下列格式发送的消息：

[{"column": "name", "value": {"String": "John"}}, {"column": "age", "value": {"Int": 14}}, {"column": "vol", "value": {"Double": 12132.213}}, {"column": "status", "value": {"Byte": 89}}, {"column": "limit", "value": {"Int": 34}},
{"column": "name", "value": {"String": "Andrew"}}, {"column": "age", "value": {"Int": 23}}, {"column": "vol", "value": {"Double":2121.21}}, {"column": "status", "value": {"Byte": 78}}, {"column": "limit", "value": {"Int": 66}}]

我有两个函数可以生成这个：

def get_value(data_type, res):
    if data_type == 'string':
       return {'String' : res.strip()}
    elif data_type == 'byte' :
       return {'Byte' : ord(res[0])} 
    elif data_type == 'int':
       return {'Int' : int(res)}
    elif data_type == 'double':
       return {'Double' : float(res)}

def generate_message(data, fields, desc):
    result = []
    for row in data:
       for field, res in zip(fields, row):
           data_type = desc[field]
           val = {'column' : field, 
                  'value'  : get_value(data_type, res)}
           result.append(val)
    return result

但是，数据非常庞大，包含大量元组（约 200,000 个）。为它们中的每一个生成上述消息格式需要花费大量时间。有没有一种有效的方法来做到这一点。

P.S 需要这样的消息，因为我在队列上发送此消息，而消费者是需要类型信息的 C++ 客户端。

【问题讨论】：

为什么不直接发送第一个字典 desc 作为模式，然后按原样发送数据，即将元数据交换与数据交换分开。
因此，在这种任务上，C++ 应该比 Python 更快
你真的需要构建 Python 数据结构，它是一个字典列表，其中一个字段是另一个字典，或者只是格式化为字符串？这会带来很大的不同。

标签： python performance optimization

【解决方案1】：

列表推导应该更快。它们也易于阅读和简洁。

In [94]: def generate_message_faster(data, fields, desc):
    ...:     return [
    ...:        {'column': field, 'value': get_value(desc[field], res)} 
    ...:        for row in data for field, res in zip(fields, row)
    ...:     ]
    ...:

In [95]: generate_message_fast(data, fields, desc)
Out[95]:
[{'column': 'name', 'value': {'String': 'John'}},
 {'column': 'age', 'value': {'Int': 14}},
 {'column': 'vol', 'value': {'Double': 12132.213}},
 {'column': 'status', 'value': {'Byte': 89}},
 {'column': 'limit', 'value': {'Int': 34}},
 {'column': 'name', 'value': {'String': 'Andrew'}},
 {'column': 'age', 'value': {'Int': 23}},
 {'column': 'vol', 'value': {'Double': 2121.21}},
 {'column': 'status', 'value': {'Byte': 78}},
 {'column': 'limit', 'value': {'Int': 66}}]

In [96]: %timeit(generate_message(data, fields, desc))
100000 loops, best of 3: 7.84 µs per loop

In [97]: %timeit(generate_message_faster(data, fields, desc))
The slowest run took 4.24 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.9 µs per loop

【讨论】：

【解决方案2】：

以aydow的回答为基础，加快速度：

dt_action = {
  'string': (lambda res: {'String': res.strip()}),
  'byte': (lambda res: ord(res[0])),
  'int': (lambda res: int(res)),
  'double': (lambda res: float(res)),
}

def generate_message_faster(data, fields, desc):
  return [
    {'column': field, 'value': dt_action[desc[field]](res)}
    for row in data for field, res in zip(fields, row)
  ]

时间安排：

原创6.44 µs per loop
与dt_action:5.54 µs per loop
与dt_action 并列出comp：4.92 µs per loop

【讨论】：