【问题标题】:Python parallel computation without pickling无需酸洗的 Python 并行计算
【发布时间】:2018-02-28 00:34:11
【问题描述】:

我想并行化一个非常简单的列表理解:

nlp = spacy.load(model)
texts = sorted(X['text'])
# TODO: Parallelize
docs = [nlp(text) for text in texts]

但是,当我尝试像这样使用 multiprocessing 模块中的 Pool 时:

docs = Pool().map(nlp, texts)

它给了我以下错误:

Traceback (most recent call last):
  File "main.py", line 117, in <module>
    main()
  File "main.py", line 99, in main
    docs = parse_docs(X)
  File "main.py", line 81, in parse_docs
    docs = Pool().map(nlp, texts)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'FeatureExtracter.<locals>.feature_extracter_fwd'

是否可以在不必使对象可腌制的情况下进行这种并行计算?我对与joblib 等第三方库相关的示例持开放态度。

编辑:我也试过了

docs = Pool().map(nlp.__call__, texts)

那也没用。

【问题讨论】:

    标签: python multithreading parallel-processing python-multiprocessing joblib


    【解决方案1】:

    很可能不会。您可能正在尝试共享较低级别的不安全的跨进程共享的内容,例如带有打开文件描述符的东西。 There's some discussion here 为什么它不能腌制,他们含糊地说这是出于类似的原因。为什么不在每个进程中分别加载nlp

    这里还有更多,似乎是他们正在努力解决的 spacy 的一般问题:https://github.com/explosion/spaCy/issues/1045

    【讨论】:

    • 感谢您提供的链接。我能够使用dill 模块了解到 spaCy 腌制对象,因此为了避免腌制错误,我做了import multiprocessing_on_dill as multiprocessing
    • 啊,所以 spacy 2 现在出来了,最近。我以为你在使用 spacy 1。很好。
    【解决方案2】:

    解决方法可能如下

    texts = ["Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season.",
            "The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\u201310 to earn their third Super Bowl title.",
            "The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.",
            "As this was the 50th Super Bowl, the league emphasized the"]
    
    def init():
        global nlp
        nlp = spacy.load('en')
    
    def func(text):
        global nlp
        return nlp(text)
    
    with mp.Pool(initializer=init) as pool:
        docs = pool.map(func, texts)
    

    哪个输出

    for doc in docs:
        print(list(w.text for w in doc))
    
    ['Super', 'Bowl', '50', 'was', 'an', 'American', 'football', 'game', 'to', 'determine', 'the', 'champion', 'of', 'the', 'National', 'Football', 'League', '(', 'NFL', ')', 'for', 'the', '2015', 'season', '.']
    ['The', 'American', 'Football', 'Conference', '(', 'AFC', ')', 'champion', 'Denver', 'Broncos', 'defeated', 'the', 'National', 'Football', 'Conference', '(', 'NFC', ')', 'champion', 'Carolina', 'Panthers', '24–10', 'to', 'earn', 'their', 'third', 'Super', 'Bowl', 'title', '.']
    ['The', 'game', 'was', 'played', 'on', 'February', '7', ',', '2016', ',', 'at', 'Levi', "'s", 'Stadium', 'in', 'the', 'San', 'Francisco', 'Bay', 'Area', 'at', 'Santa', 'Clara', ',', 'California', '.']
    ['As', 'this', 'was', 'the', '50th', 'Super', 'Bowl', ',', 'the', 'league', 'emphasized', 'the']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-05-03
      • 2011-04-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-05-08
      • 2015-12-27
      相关资源
      最近更新 更多