具有第二个功能延迟的多处理答案

【问题标题】：multiprocessing with delay of second function具有第二个功能延迟的多处理
【发布时间】：2016-07-12 14:23:51
【问题描述】：

我有一个将数据写入文本文件的函数，第二个函数从同一个文本文件中提取数据并显示图表。我想在第一个函数启动后几秒钟启动第二个函数，并一起运行直到两者都完成。通过这种方式，我可以获得实时图表。我在下面编写的代码同时启动了这两个函数，但第二个函数看不到任何文本文件。我需要为第二个函数延迟一些时间，以便给第一个函数创建文本文件的时间。

但是，因为第二个函数（live_graph）不仅需要从文本文件中提取数据，还需要从第一个函数中获取一些参数（例如图表的标题），我不确定这是否是正确的继续方式，似乎不可能从另一个函数中获取“密钥”；我得到“未定义密钥”。也许我还必须将参数写入文本文件？

from multiprocessing import Process
import time

def writing():    
    numentries = 0
    for text in get_all(newlista, "sentence", "text"):

        if text.lower().startswith( key.lower().split(None, 1)[0] ):
            pass
        elif len(text) > 500:
            pass
        elif len(text) < 80:
            pass
        else:
            on_data(text)

            numentries += 1




  def live_graph():
  #pull data from text.txt
   time.sleep(5)


if __name__=='__main__':

 p1 = Process(target = writing)
 p1.start()
 p2 = Process(target = live_graph)
 p2.start()

【问题讨论】：

为什么不以一种在文本文件可用之前阻塞的方式实现第二个？
怎么样？然后我就可以从第一个函数中提取参数？
我已经用 time.sleep 更新了原始代码，它似乎可以工作。另外，我已经从第一个函数中运行了参数，现在我可以在第二个函数中使用它们。一个问题：在 python 网站上，我读到“实际暂停时间可能少于请求的时间，因为任何捕获的信号都会在执行该信号的捕获例程后终止 sleep()”，这意味着我可以安全行事，例如 15 secs cos' 一旦第一个函数生成了文本文件，第二个函数就会启动吗？
在进程之间使用信号量，比如multiprocessing.Queue 是处理这个问题的更明智的方法

标签： python-3.x matplotlib multiprocessing

【解决方案1】：

您想使用multiprocessing.Queue

例如（来自文档）：

from multiprocessing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print q.get()    # prints "[42, None, 'hello']"
    p.join()

您可以像这样在代码中使用它：

from multiprocessing import Process, Queue
import time


def writing(q):
    keep_running = True
    numentries = 0
    key = 'something, I assume'
    for text in get_all(newlista, "sentence", "text"):
        # Python lets you do nice comparisons like this,
        # which do what they look like. And make it more
        # obvious that you only want lines of length between
        # 80 and 500 characters
        if 80 < len(text) < 500:
            firstword = key.lower().split(None, 1)[0]
            if text.lower().startswith(firstword):               
                # Note: it is *essential* that in `on_data(text)`
                # after writing to your file you run `file.flush()`
                # or close the file, otherwise the data *may* be
                # buffered and hence, missing, when you go to read
                # the file
                on_data(text)
                q.put(keep_running)

    keep_running = False
    q.put(keep_running)

def live_graph(q):
    keep_running = True
    while keep_running:
        keep_running = q.get()
        # do the graph updates here


if __name__=='__main__':
 q = Queue()
 p1 = Process(target = writing, args=(q,))
 p1.start()
 p2 = Process(target = live_graph, args=(q,))
 p2.start()

【讨论】：

我已经编辑了我的代码以显示实际的第一个功能。我不确定如何实施您的建议。请注意，在第一个函数中，我嵌入了另外 2 个函数。从 json 文件中提取文本的“get_all”和实际上是写入文本文件的函数的“on_data”
@ganesa75 我已更新我的代码以反映您添加的代码。请注意，您的 key 之前在您提供的代码示例中未定义（还要仔细检查缩进 - 您的编辑把这一切搞砸了）所以我做了我唯一能做的事情并猜测。作为一个建议 - 大多数情况下，如果您的代码因为不是 minimal reproducible example 而无法运行，人们就不会运行它，所以缩进和使用未定义的变量名确实很重要： )
我很难在这里粘贴代码，我一直在手动编辑右侧的 4 个空格。我已经关闭了 on_data() 中的文件，但没有关闭 file.flush()。实际上代码必须对以key first word开头的句子进行bin，我已经修改了它。该代码有效。但是： 1. 它占用了我所有的 RAM（几乎 8gb），是因为它是一个多进程吗？ 2. 解析完所有文本后，代码会一直运行，直到我手动关闭图形，这正常吗？ 3. 有时我会在 Eclipse 标准输出控制台中看到重复，但在文本文件中却没有
您可以突出显示您的代码块并按“代码格式”按钮{}
至于占用你的内存，这将取决于你在做什么。作为多进程，它应该占用比 RAM 更多的 CPU。我怀疑您的一个进程中有一些相当大的对象。我不知道你的数据结构是什么样的，所以我不能告诉你。如果您在 Google 上搜索 "python code profiling"，您应该能够找到一些有用的工具。如果这个答案解决了您的问题，您可能应该继续接受它:)