如何在python3中使用线程池加速程序答案

【问题标题】：How to speed up the program with thread pool in python3如何在python3中使用线程池加速程序
【发布时间】：2015-10-18 14:41:36
【问题描述】：

what kind of thread or how many thread should I use, if I doing the following?

我的问题是这样的：

线程 1 执行该过程：只要 line_1（base_file 的第一行）中的姓氏可以在 row_1-row_end（huge_file 中的每一行）中匹配，则在 row_1-row_end 中写入 line_1 和几行（如果匹配）。

线程 2 做这个过程：只要 line_2（base_file 的第一行）中的姓氏可以在 row_1-row_end（huge_file 中的每一行）中匹配，如果匹配，则在 row_1-row_end 中写入 line_2 和几行。

线程 3 执行该过程：只要 line_3（base_file 的第一行）中的姓氏可以在 row_1-row_end（huge_file 中的每一行）中匹配，如果匹配，则在 row_1-row_end 中写入 line_3 和几行。

........

线程 100 执行该过程：只要 line_100（base_file 的第一行）中的姓氏可以在 row_1-row_end（huge_file 中的每一行）中匹配，则在 row_1-row_end 中写入 line_100 和几行（如果匹配）。

这 100 个或更多线程都同时启动。这可能吗？

【问题讨论】：

标签： multithreading python-3.x threadpool

【解决方案1】：

我有正常的代码来执行分步工作，这是一个嵌套的 for 循环，但我需要很长时间来处理我的代码。

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import csv
import codecs
    list1 = []
    list2 = []
    with open("Director_1980.csv", 'wt', newline = '') as f3:
        writer = csv.writer(f3)    
        f2list = []
        with open("contribDB_1980final.csv", 'rt') as f2: ## This file is a big file
            reader = csv.reader(f2)    
            for row in reader: 
                f2list.append(row)

        with codecs.open("director.csv", "r",encoding='utf-8', errors='ignore') as fdata:
            for line in fdata:
                line = line.split("|")
                lName = line[5]
                fName = line[1] 
                mName = line[2]
                employer = line[6]


                for row in f2list: 
                    lName2 = row[7]
                    fName2 = row[8]
                    mName2 = row[9]
                    employer2 = row[20]

                    list1 = []
                    list2 = []
                    if fuzz.token_set_ratio(lName, lName2) == 100:
                        count2 = count2 + 1  
                        print(count2)
                        #print(count2 )
                        lName_ratio = 100
                        fName_ratio = fuzz.token_set_ratio(fName, fName2)
                        mName_ratio = fuzz.token_set_ratio(mName, mName2)
                        employer_ratio = fuzz.token_set_ratio(employer, employer2)
                        new_line = line + row
                        new_line.insert(16, lName_ratio)
                        new_line.insert(18, fName_ratio) 
                        new_line.insert(20, mName_ratio)
                        new_line.insert(32, employer_ratio)

                        writer.writerow(new_line)

【讨论】：

您在填充f2list 时会占用大量内存。为什么不同时处理一行contribDB_1980final.csv，而不是将它们全部读入内存？
获取一行 contribDB_1980final.csv 是什么意思。我怎样才能做到这一点？
对不起，我的错误。你的循环是嵌套的......我建议在contribDB_1980final.csv 和director.csv 上颠倒迭代的顺序，但我刚刚意识到，这行不通。对不起。