【问题标题】:Python school/ learningPython学校/学习
【发布时间】:2020-12-19 04:48:17
【问题描述】:

我想知道是否可以在学习方面寻求帮助...但是就这样吧。

在一门课程中,我尝试轻松地做我在 C# 中做过的事情。想法是将虚构的销售数据放入对象数组,对其进行排序,将其驱动到 bin 文件,json 文件,xml 文件,从数据中生成 10x 1000 个批量随机订单 ID,使用二进制搜索搜索对象并将所有这些保存到 excel 文件这样最终我们就有了一个 bin 文件、一个 json 文件、一个 xml 文件和 10 个 excel 文件,每个文件有 1000 行。

这在程序上很容易做到,但是在尝试多处理时,它变得非常奇怪。首先,它从上开始一直到行:if __name__ == '__main__',之后它似乎开始多次执行所有相同的方法......

这是我的主要内容:

import multiprocessing
import CsvReader
import DataContainer
import FileWriter
from datetime import datetime
import ExcelHandler

processes = []
container = DataContainer.DataContainer() #csv_reader.readFile())
csv_reader = CsvReader.CsvReader(r'C:\Users\jansse\Desktop\opiskelut\Python\finalCapstoneProj\100000 Sales Records.csv', container)
print(f'objects inside from the csv file: {datetime.now().time()}')
print(f"lines: {csv_reader.readFile()}")
print(f'begin sorting: {datetime.now().time()}')
container.sort_container()
print(f'sort done, multiprocessingStart {datetime.now().time()}')
fWriter = FileWriter.FileWriter(container)
p1 = multiprocessing.Process(target=fWriter.write_to_binary)
processes.append(p1)
p2 = multiprocessing.Process(target=fWriter.write_to_json)
processes.append(p2)
p3 = multiprocessing.Process(target=fWriter.generate_XmlFile)
processes.append(p3)
xlHandler = ExcelHandler.ExcelHandler(container)
p4 = multiprocessing.Process(target=xlHandler.rollExcels)
processes.append(p4)


if __name__ == '__main__':

   for p in processes:
       p.start()

   for p in processes:
       p.join()

这里是一个方法的例子:

    def write_to_json(self):
        import json
        print(f'json writing start {datetime.now().time()}')

        try:
            jsonData = ''
            with open('objects.json', 'w') as outputfileJson:
                dEncoder = DataModel.DataEncoder
                json.dump(self.container.returnContainer(), outputfileJson ,cls=dEncoder)
        except Exception as e:
            print(f'Something happened: {e.__class__}, {e.args[0]}')
        finally:
            return (f'json writing completed {datetime.now().time()}')

输出看起来像这样。为什么看起来它开始回到准备数据的方法(如排序),以及为什么它不等待 json-write 完成,尽管我在声明它们之后加入了进程?我的调试打印:

objects inside from the csv file: 06:27:33.139741
lines: 100001
begin sorting: 06:27:34.733476
sort done, multiprocessingStart 06:27:34.893071
objects inside from the csv file: 06:27:35.511050
lines: 100001
begin sorting: 06:27:37.111736
sort done, multiprocessingStart 06:27:37.283164
binarywriting start 06:27:38.886788
objects inside from the csv file: 06:27:39.482656
lines: 100001
begin sorting: 06:27:41.124737
sort done, multiprocessingStart 06:27:41.267926
json writing start 06:27:42.855922
objects inside from the csv file: 06:27:43.400854
lines: 100001
begin sorting: 06:27:44.938005
sort done, multiprocessingStart 06:27:45.097263
xml writing start 06:27:46.922682
objects inside from the csv file: 06:27:47.495836
lines: 100001
begin sorting: 06:27:49.225932
sort done, multiprocessingStart 06:27:49.397371
Excel writing start 06:27:51.089208
excelfile generating:0
excelfile generating:1
excelfile generating:2
excelfile generating:3
excelfile generating:4
excelfile generating:5
excelfile generating:6
excelfile generating:7
excelfile generating:8
excelfile generating:9
Excel writing completed 06:27:56.816604
xml writing completed 06:28:27.855510

【问题讨论】:

  • 尝试将这些任务包装在函数中,并将其作为参数传递给进程。请参阅docs.python.org/3/library/…spawn 部分表示将启动一个新进程,其中包含所需的一切。所以也许这是设置的方式,Python 认为它需要 container 并且获得它的唯一方法是运行上面的行。如果你做了Process(target=do_json, args=(fWriter,))(创建一个函数do_json,在它的参数上调用你的obj方法)可能会阻止它重新运行
  • 将所有主要代码放在if __name__ == '__main__': ... 范围内。我相信这与Process 类文档中的Safe importing of main module 注释有关:docs.python.org/3/library/…

标签: python python-multiprocessing python-module file-writing


【解决方案1】:

PYPL:这有帮助!!!谢谢:)

import multiprocessing
import DataContainer
import FileWriter
from datetime import datetime
import ExcelHandler
import CreateContainer

if __name__ == '__main__':

    creator = CreateContainer.CreateContainer()
    container = creator.createContainer()

    processes = []
    print(f'multiprocessingStart {datetime.now().time()}')
    fWriter = FileWriter.FileWriter(container)
    p1 = multiprocessing.Process(target=fWriter.write_to_binary)
    processes.append(p1)
    p2 = multiprocessing.Process(target=fWriter.write_to_json)
    processes.append(p2)
    p3 = multiprocessing.Process(target=fWriter.generate_XmlFile)
    processes.append(p3)
    xlHandler = ExcelHandler.ExcelHandler(container)
    p4 = multiprocessing.Process(target=xlHandler.rollExcels)
    processes.append(p4)

    for p in processes:
        p.start()

    for p in processes:
        p.join()

现在打印看起来像这样:

objects inside from the csv file: 06:05:59.502160
lines: 100001
begin sorting: 06:06:01.256706
containerReady 06:06:01.415215
multiprocessingStart 06:06:01.428252
binarywriting start 06:06:03.183367
json writing start 06:06:04.988768
xml writing start 06:06:06.725813
Excel writing start 06:06:08.561955
excelfile generating:0
excelfile generating:1
excelfile generating:2
excelfile generating:3
excelfile generating:4
excelfile generating:5
excelfile generating:6
excelfile generating:7
excelfile generating:8
excelfile generating:9
Excel writing completed 06:06:15.105672
xml writing completed 06:06:49.466709
json writing completed 06:08:15.298768

...我没有从 write_to_json-method 看到 print 的原因是因为我从那里返回变量并忘记将其更改为 print-method。

【讨论】: