在python中嵌套for循环的情况下如何使用threadPoolExecutor的map（）答案

【问题标题】：How to use map() of threadPoolExecutor in case of nested for loop in python在python中嵌套for循环的情况下如何使用threadPoolExecutor的map（）
【发布时间】：2021-09-06 09:03:06
【问题描述】：

我有两本词典，

data1 = {
  "key": [
    {
      "id": "key1",
      "name": "key1"
    },
    {
      "id": "key2",
      "name": "key2"
    },
    {
      "id": "key3",
      "name": "key3"
    },
  ]
}

data2 = {
  "key": [
    {
      "id": "key2"
      "name": "TEST key2"
    },
    {
      "id": "key1",
      "name": "TEST key1"
    },
  ]
}

我正在使用以下代码制作data1 和data2 中key 列表中具有匹配id 的对象的元组列表


common_keys = [
    (each_data1_key, each_data2_key)
    for each_data1_key in data1.get("key", [])
    for each_data2_key in data2.get("key", [])
    if each_data1_key.get("id") == each_data2_key.get("id")
]

# Example result = [({"id":"key1", "name": "key1"}, {"id": "key1", "name": "TEST key1"}), ...]

现在我想使用这些元组在 threadPoolExecutor 的 map 函数中进一步处理。目前，我正在使用以下代码，

def func(object1, object2):
   """
   func is being run in the thread to do some task parallelly with object1 and object2
   """
   <SOME CODE HERE> ...

def myfunc(common_keys):
    if common_keys:
        with ThreadPoolExecutor(max_workers=10) as executor:
            executor.map(lambda x: func(*x), common_keys)

# func is a function that accepts 2 objects as parameters
# since we are sending tuple of the object in threads, in order to process some task

我的任务是通过减少循环来优化代码（我已经使用嵌套的for循环来查找common_keyslist`

谁能帮我找到任何解决方案，为了获得具有相同 id 的对象的元组列表，我不需要使用嵌套循环（或者，使用另一种优化方式）？

【问题讨论】：

标签： python threadpoolexecutor

【解决方案1】：

在https://stackoverflow.com/a/18554039/9981846 的基础上，如果你有一些空闲的内存，你可以制作你的 ids 字典键，以便以后从快速的类似集合的操作中受益。

# Loop once for each list
dict1 = {item["id"]: item for item in data1.get("key", [])}
dict2 = {item["id"]: item for item in data2.get("key", [])}

# Set intersection is fast
common_keys = [(dict1[key], dict2[key])
               for key
               in dict1.keys() & dict2.keys()]

另外，如果您将字典传递给myfunc，而不是common_keys，您可以不用使用生成器创建该列表。

def func(object1, object2):
    print(f"Got 1: {object1}, and 2: {object2}")


def generate_pairs(d1, d2):
    for key in d1.keys() & d2.keys():
        yield d1[key], d2[key]


def myfunc(d1, d2):
    if common_keys:
        with ThreadPoolExecutor(max_workers=10) as executor:
            executor.map(lambda x: func(*x), generate_pairs(d1, d2))


myfunc(dict1, dict2)
>>> Got object1: {'id': 'key2', 'name': 'key2'}, object2: {'id': 'key2', 'name': 'TEST key2'}
>>> Got object1: {'id': 'key1', 'name': 'key1'}, object2: {'id': 'key1', 'name': 'TEST key1'}

最后，为了保持速度和备用内存，您可以只创建两个字典中最小的一个，将"key" 列表传递给生成器：

def generate_pairs(l1, l2):
    little, big = (l1, l2) if (len(l1) < len(l2)) else (l2, l1)
    d1 = {item["id"]: item for item in little}

    # loop once over the second list
    for key_data_2 in big:
        key_data_1 = d1.get(key_data_2["id"], None)  # Average case fast too
        if key_data_1 is not None:
            yield key_data_1, key_data_2


# with the same `myfunc` except for parameters types
def myfunc(l1, l2):
    if common_keys:
        with ThreadPoolExecutor(max_workers=10) as executor:
            executor.map(lambda x: func(*x), generate_pairs(l1, l2))


# and you'd call 
myfunc(data1.get("key", []), data2.get("key", []))

【讨论】：

感谢您的回复。这是一个非常好的解决方案。