在多线程程序中保护临界区答案

【问题标题】：Guarding critical section in a multithreaded program在多线程程序中保护临界区
【发布时间】：2018-03-21 13:39:31
【问题描述】：

我有一个多线程 Python 程序（金融交易），其中某些线程执行关键部分（例如在执行交易的过程中）。执行临界区的线程是守护线程。程序的主线程捕获SIGINT 并尝试通过释放子线程持有的所有资源来优雅地退出程序。为了防止主线程导致子线程突然终止；主线程将遍历子线程对象列表并调用它们的shutdown() 函数。此函数将阻塞，直到线程的关键部分完成后才返回。

以下是基本实现

class ChildDaemonThread(Thread):

    def __init__(self):
        self._critical_section = False        
        # other initialisations

    def shutdown(self):
        # called by parent thread before calling sys.exit(0)

        while True:
            if not self._critical_section:
                break

            # add code to prevent entering critical section
            # do resource deallocation

     def do_critical_stuff(self):
         self._critical_section = True
         # do critical stuff
         self._critical_section = False

     def run(self):
         while True:
             self._do_critical_stuff()

我不确定我的实现是否会起作用，因为当ChildDaemonThread 通过do_critical_stuff() 执行临界区时，如果父线程调用子线程的shutdown()，它会阻塞直到临界区执行，那么此时同时调用ChildDaemonThreadrun()和do_critical_stuff()这两个方法（我不确定这是否合法）。这可能吗？我的实现是否正确？有没有更好的方法来实现这一点？

【问题讨论】：

标签： python multithreading critical-section

【解决方案1】：

在这个实现中存在一些竞争条件。

您无法保证主线程会在正确的时间检查_critical_section 的值以查看False 的值。在主线程再次检查值之前，工作线程可能会离开并重新进入临界区。这可能不会导致任何正确性问题，但可能会导致您的程序需要更长的时间才能关闭（因为当主线程“错过”安全关闭时间时，它必须等待另一个关键部分完成）。

此外，在主线程注意到_critical_section 是False 但在主线程设法导致进程退出之前，工作线程可能重新进入临界区。这可能会带来真正的正确性问题，因为它有效地破坏了您确保关键部分完成的尝试。

当然，程序也可能由于其他问题而崩溃。因此，如果您实现从中断的临界区中恢复的功能，可能会更好。

但是，如果您想最大程度地改进此策略，我建议您使用类似的方法：

class ChildDaemonThread(Thread):

    def __init__(self):
        self._keep_running = True
        # other initialisations

    def shutdown(self):
        # called by parent thread before calling sys.exit(0)
        self._keep_running = False

     def do_critical_stuff(self):
         # do critical stuff

     def run(self):
         while self._keep_running:
             self._do_critical_stuff()
         # do resource deallocation


workers = [ChildDaemonThread(), ...]

# Install your SIGINT handler which calls shutdown on all of workers
# ...

# Start all the workers
for w in workers:
    w.start()

# Wait for the run method of all the workers to return
for w in workers:
    w.join()

这里的关键是join 将阻塞直到线程完成。这样可以确保您不会中断一个中间关键部分。

【讨论】：

嘿，非常感谢。你的回答很有道理。我在想而不是使用Boolean 标志_keep_running 我可以给你一个Event() 的实例，它是线程安全的。您对此有何评论？
我不认为在这种情况下您需要Event。在 Python 中设置属性是线程安全的（一个线程可以执行self._keep_running = False，而另一个线程读取self._keep_running；在这种情况下，您永远不会得到不一致的结果）。确实，在某些多线程场景中，Event 之类的东西可能很有用（而且通常必要）。但是......不是这个，我想。我可能错了。推理线程是一项挑战。