锁定 C++11 std::unique_lock 会导致死锁异常答案

【问题标题】：Locking C++11 std::unique_lock causes deadlock exception锁定 C++11 std::unique_lock 会导致死锁异常
【发布时间】：2013-07-24 06:59:28
【问题描述】：

我正在尝试使用 C++11 std::condition_variable，但是当我尝试从第二个线程锁定与其关联的 unique_lock 时，我得到一个异常“避免资源死锁”。创建它的线程可以锁定和解锁它，但不是第二个线程，即使我很确定 unique_lock 不应该在第二个线程尝试锁定它时已经被锁定。

FWIW 我在 Linux 中使用 gcc 4.8.1 和 -std=gnu++11。

我已经围绕 condition_variable、unique_lock 和 mutex 编写了一个包装类，所以我的代码中没有其他任何东西可以直接访问它们。注意 std::defer_lock 的使用，我已经掉进了那个陷阱:-)。

class Cond {
private:
    std::condition_variable cCond;
    std::mutex cMutex;
    std::unique_lock<std::mutex> cULock;
public:
    Cond() : cULock(cMutex, std::defer_lock)
    {}

    void wait()
    {
        std::ostringstream id;
        id << std::this_thread::get_id();
        H_LOG_D("Cond %p waiting in thread %s", this, id.str().c_str());
        cCond.wait(cULock);
        H_LOG_D("Cond %p woke up in thread %s", this, id.str().c_str());
    }

    // Returns false on timeout
    bool waitTimeout(unsigned int ms)
    {
        std::ostringstream id;
        id << std::this_thread::get_id();
        H_LOG_D("Cond %p waiting (timed) in thread %s", this, id.str().c_str());
        bool result = cCond.wait_for(cULock, std::chrono::milliseconds(ms))
                == std::cv_status::no_timeout;
        H_LOG_D("Cond %p woke up in thread %s", this, id.str().c_str());
        return result;
    }

    void notify()
    {
        cCond.notify_one();
    }

    void notifyAll()
    {
        cCond.notify_all();
    }

    void lock()
    {
        std::ostringstream id;
        id << std::this_thread::get_id();
        H_LOG_D("Locking Cond %p in thread %s", this, id.str().c_str());
        cULock.lock();
    }

    void release()
    {
        std::ostringstream id;
        id << std::this_thread::get_id();
        H_LOG_D("Releasing Cond %p in thread %s", this, id.str().c_str());
        cULock.unlock();
    }
};

我的主线程创建了一个 RenderContext，它有一个与之关联的线程。从主线程的角度来看，它使用 Cond 向渲染线程发出信号以执行操作，并且还可以在 COnd 上等待渲染线程完成该操作。渲染线程在 Cond 上等待主线程发送渲染请求，并在必要时使用相同的 Cond 告诉主线程它已经完成了一个动作。当渲染线程尝试锁定 Cond 以检查/等待渲染请求时，会出现我遇到的错误，此时它根本不应该被锁定（因为主线程正在等待它），更不用说由相同的线程。这是输出：

DEBUG: Created window
DEBUG: OpenGL 3.0 Mesa 9.1.4, GLSL 1.30
DEBUG: setScreen locking from thread 140564696819520
DEBUG: Locking Cond 0x13ec1e0 in thread 140564696819520
DEBUG: Releasing Cond 0x13ec1e0 in thread 140564696819520
DEBUG: Entering GLFW main loop
DEBUG: requestRender locking from thread 140564696819520
DEBUG: Locking Cond 0x13ec1e0 in thread 140564696819520
DEBUG: requestRender waiting
DEBUG: Cond 0x13ec1e0 waiting in thread 140564696819520
DEBUG: Running thread 'RenderThread' with id 140564575180544
DEBUG: render thread::run locking from thread 140564575180544
DEBUG: Locking Cond 0x13ec1e0 in thread 140564575180544
terminate called after throwing an instance of 'std::system_error'
  what():  Resource deadlock avoided

说实话，我真的不明白 unique_lock 的用途以及为什么 condition_variable 需要一个而不是直接使用互斥锁，所以这可能是问题的原因。我在网上找不到很好的解释。

【问题讨论】：

不要对所有线程使用相同的unique_lock，这不是它的本意。将它们用作块作用域中的 RAII 对象，而不是类成员。这样，每个调用你的函数的线程都会有自己的实例。另外，请注意虚假唤醒。
我明白了，所以每个想要等待或发送通知的上下文都应该使用自己的 unique_lock，但都共享同一个互斥锁？
等待，不要发送（cv.notify() 不需要锁）。但除此之外，是的。我将尝试整理一个答案，向您展示如何正确使用这一切，我现在有点忙。
我没有意识到 notify() 不需要锁，我想在这种情况下我可以移除一些锁。
@syam 感谢您提供示例，但我认为您已经为我很好地回答了这个问题。我已经按照您的建议更改了我的代码以使用 RIIA，它现在可以正常工作。有没有办法可以将您的评论转换为答案，或者我应该根据您的 cmets 做出答案？

标签： linux multithreading c++11 mutex condition-variable

【解决方案1】：

前言：了解条件变量的重要一点是，它们可能会受到随机、虚假的唤醒。换句话说，CV 可以从wait() 退出，而无需任何人先调用notify_*()。不幸的是，没有办法将这种虚假唤醒与合法唤醒区分开来，因此唯一的解决方案是拥有一个额外的资源（至少是一个布尔值），以便您可以判断唤醒条件是否实际满足。

这个额外的资源也应该由互斥体保护，通常与您用作 CV 伴侣的相同。

CV/mutex 对的典型用法如下：

std::mutex mutex;
std::condition_variable cv;
Resource resource;

void produce() {
    // note how the lock only protects the resource, not the notify() call
    // in practice this makes little difference, you just get to release the
    // lock a bit earlier which slightly improves concurrency
    {
        std::lock_guard<std::mutex> lock(mutex); // use the lightweight lock_guard
        make_ready(resource);
    }
    // the point is: notify_*() don't require a locked mutex
    cv.notify_one(); // or notify_all()
}

void consume() {
    std::unique_lock<std::mutex> lock(mutex);
    while (!is_ready(resource))
        cv.wait(lock);
    // note how the lock still protects the resource, in order to exclude other threads
    use(resource);
}

与您的代码相比，请注意多个线程如何同时调用produce()/consume() 而不必担心共享unique_lock：唯一共享的东西是mutex/cv/resource，每个线程都有自己的unique_lock，这迫使线程等待如果互斥锁已被其他东西锁定，则轮到它。

如您所见，资源并不能真正与 CV/互斥锁对分开，这就是为什么我在评论中说您的包装类并不真正适合恕我直言，因为它确实试图将它们分开。

通常的方法不是像您尝试的那样为 CV/mutex 对制作包装器，而是为整个 CV/mutex/resource 三重奏组制作包装器。例如。一个线程安全的消息队列，消费者线程将在 CV 上等待，直到队列有消息可供使用。

如果您真的只想包装 CV/互斥锁对，您应该摆脱不安全的 lock()/release() 方法（从 RAII 的角度来看）并将它们替换为单个lock() 方法返回 unique_ptr:

std::unique_ptr<std::mutex> lock() {
    return std::unique_ptr<std::mutex>(cMutex);
}

这样你就可以使用你的Cond包装类，就像我上面展示的一样：

Cond cond;
Resource resource;

void produce() {
    {
        auto lock = cond.lock();
        make_ready(resource);
    }
    cond.notify(); // or notifyAll()
}

void consume() {
    auto lock = cond.lock();
    while (!is_ready(resource))
        cond.wait(lock);
    use(resource);
}

但老实说，我不确定这是否值得麻烦：如果您想使用 recursive_mutex 而不是普通的 mutex 怎么办？好吧，您必须从您的类中制作一个模板，以便您可以选择互斥锁类型（或者完全编写第二个类，是的代码重复）。无论如何，您不会获得太多收益，因为您仍然必须编写几乎相同的代码来管理资源。仅用于 CV/互斥锁对的包装器类太薄，无法真正有用恕我直言。但像往常一样，YMMV。

【讨论】：

感谢您的详细回答。但是在调用 cv.wait() 时不必传递对 unique_lock 的引用吗？
哎呀你是对的，看起来我有点得意忘形了。 :) 现在解决这个问题。
我使用包装类的原因是因为我正在编写一些可移植的框架代码。我最初打算在 Windows 和 Linux 上使用 SDL 的线程 API，而在 Android 中，要么将所有线程管理留给 Java，要么使用 pthreads。我不是 100% 确信 C++11 支持在我可能想要支持的每个平台上都是稳定的，但希望它应该没问题。
我已经切换到 RAII，但是以不同的方式包装 API。我使用了typedef std::unique_lock<std::mutex> CondLock 并将一个强制转换运算符从 Cond 添加到std::mutex &，这样我就可以创建一个 CondLock，并将 Cond 作为其构造函数参数。我认为你在 Cond 中使用方法来创建 CondLock 的想法更优雅，所以我可能会改变它。
是的，我能理解您对各种平台上 C++11 支持的担忧。我很幸运现在只使用 GCC，但即便如此我也被阻止了（无法让 4.8 交叉编译器工作，所以我实际上坚持使用 4.7）。编写正确的、符合标准的跨平台 C++11 现在肯定很痛苦，直到尘埃落定……