【问题标题】:QVector::clear and QFile::close in QThreadPoolQVector::clear 和 QFile::close 在 QThreadPool
【发布时间】:2020-03-06 04:16:33
【问题描述】:

我正在使用 QThreadPool 来运行具有创建然后清除巨大 QVector 并写入巨大文件大小的功能的工作程序。但是,每次一个工作人员到达那行(QVector::clear/QFile::close)时,所有线程都会冻结,并在完成后继续。

有人有什么建议可以克服这种情况吗?当在其中一个工作人员中运行这两个功能时,让其他线程仍然能够正常运行。对于 QFile::close,我尝试在迭代中使用 QFile::flush 而不是在迭代结束时使用 close(),但这对性能没有帮助。

这是清除向量时线程变慢时的代码

main.cpp

#include "mainwindow.h"
#include <QApplication>

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);
    MainWindow w;
    w.show();

    return a.exec();
}

主窗口.h

#ifndef MAINWINDOW_H
#define MAINWINDOW_H

#include <QMainWindow>

namespace Ui {
class MainWindow;
}

class MainWindow : public QMainWindow
{
    Q_OBJECT

public:
    explicit MainWindow(QWidget *parent = nullptr);
    ~MainWindow();

private slots:
    void on_start_pushButton_clicked();

private:
    Ui::MainWindow *ui;
};

#endif // MAINWINDOW_H

主窗口.cpp

#include "mainwindow.h"
#include "ui_mainwindow.h"
#include "worker.h"

#include <QDebug>
#include <QSharedPointer>
#include <QThread>
#include <QThreadPool>

MainWindow::MainWindow(QWidget *parent) :
    QMainWindow(parent),
    ui(new Ui::MainWindow)
{
    ui->setupUi(this);

    on_start_pushButton_clicked();
}

MainWindow::~MainWindow()
{
    delete ui;
}

void MainWindow::on_start_pushButton_clicked()
{
    int numProcess = 20;
    int numTraces = 10000;
    int numSamps = 8680;

    qDebug() << "main" << QThread::currentThread();
    QThreadPool *pool = QThreadPool::globalInstance();

    for (int i=0; i<numProcess; i++) {
        worker *w= new worker;
        w->setAutoDelete(true);

        w->setData(i+1, numTraces, numSamps);

        pool->start(w);
    }
}

主窗口.ui

<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
 <class>MainWindow</class>
 <widget class="QMainWindow" name="MainWindow">
  <property name="geometry">
   <rect>
    <x>0</x>
    <y>0</y>
    <width>400</width>
    <height>300</height>
   </rect>
  </property>
  <property name="windowTitle">
   <string>MainWindow</string>
  </property>
  <widget class="QWidget" name="centralWidget">
   <widget class="QPushButton" name="start_pushButton">
    <property name="geometry">
     <rect>
      <x>240</x>
      <y>50</y>
      <width>75</width>
      <height>23</height>
     </rect>
    </property>
    <property name="text">
     <string>Start</string>
    </property>
   </widget>
  </widget>
 </widget>
 <layoutdefault spacing="6" margin="11"/>
 <resources/>
 <connections/>
</ui>

worker.h

#ifndef WORKER_H
#define WORKER_H

#include <QObject>
#include <QRunnable>
#include <QThread>

class worker : public QObject, public QRunnable
{
    Q_OBJECT
public:
    explicit worker(QObject *parent = nullptr) : QObject(parent), QRunnable () {}
    ~worker() {}

    void setData(int id, int numTraces, int numSamps);

    void run();

signals:

public slots:

private:
    void clearVector();

    int id, numTraces, numSamps;
};

#endif // WORKER_H

worker.cpp

#include "worker.h"

#include <QCoreApplication>
#include <QDebug>
#include <QVector>

void worker::setData(int id1, int numTraces, int numSamps)
{
    this->id = id1;
    this->numTraces = numTraces;
    this->numSamps = numSamps;

    qDebug() << "setData" << id << numTraces << numSamps;
}

void worker::run()
{
    clearVector();
    qDebug() << "pool finished" << id << numTraces << numSamps << QThread::currentThread();
}

void worker::clearVector()
{
    QVector<QVector<float>> traces1, traces2;

    float progressWaypoint = 0.01f*numTraces;
    int progressPos = 0;
    for (int i=0; i<numTraces; i++) {
        QVector<float> trace1, trace2;
        for (int j=0; j<numSamps; j++) {
            trace1.append(float(j));
            trace2.append(float(numSamps - j));
        }
        traces1.append(trace1);
        traces2.append(trace2);

        if (numTraces <= 100) {
            QCoreApplication::processEvents();
        }
        else {
            if (i + 1 >= round(progressWaypoint*progressPos)) {
                QCoreApplication::processEvents();
                qDebug() << id << QThread::currentThread() << progressPos;
                progressPos++;
            }
        }
    }

    traces1.clear();
    traces2.clear();
}

【问题讨论】:

  • 很久以前我也遇到过同样的问题。最后,我发现在调试版本中清理非常大的 Qt 容器类和刷新文件需要很长时间,但在发布版本中它们非常顺利。试试看,如果我必须添加这个作为答案,请通知我。
  • 我在发布模式下尝试过,但不幸的是,当一个工作线程达到清除功能时,它的行为仍然相同
  • 你为什么打电话给QCoreApplication::processEvents()
  • 另一个提示,调用QThreadPool::globalInstance()-&gt;maxThreadCount()查看允许创建多少线程
  • 当向量超出范围并被销毁时,它也会挂起相同的 w/out QVector::clear() 调用。顺便说一句,clear() 实际上并没有取消分配内存(检查 Qt 文档),所以在这种情况下,它只会让事情变得更慢(而且这不是它挂起的地方,这在实际取消分配期间发生)。

标签: qt qt5 qthread qfile qvector


【解决方案1】:

有趣的问题。在 Windows 上测试,Qt 5.12.4。

到目前为止,我已经确定的一件事是std::vector 在这种情况下似乎表现更好。但这仍然是相当长的时间,并且确实会影响系统上的其他线程,使得 UI 只是有点响应。但比QVector好。

此外,这些数字很大,需要大量内存。在我的 32 位 MinGw 构建中,当我尝试使用 > 2 个线程时,它会因内存不足错误而崩溃。因此,测试是使用 64b MSVC2017 完成的。测试机有 8 核 @ 3。? GHz,带 64GB RAM。

以下是一些计时结果(用于生成的代码如下):

1 worker with 2 `std::vector`s:
    Worker 1 finished (ms) 1648
    Last worker finished after 1649 total ms.

5 workers with 2 `std::vector`s:
    Worker 1 finished (ms) 44363
    Worker 2 finished (ms) 44386
    Worker 3 finished (ms) 44388
    Worker 4 finished (ms) 44401
    Worker 5 finished (ms) 44448
    Last worker finished after 44449 total ms.

10 workers with 2 `std::vector`s:
    Worker 4 finished (ms) 84910
    Worker 7 finished (ms) 92701
    Worker 2 finished (ms) 111590
    Worker 8 finished (ms) 144678
    Worker 9 finished (ms) 145378
    Worker 5 finished (ms) 169067
    Worker 3 finished (ms) 211629
    Worker 1 finished (ms) 220098
    Worker 10 finished (ms) 249356
    Worker 6 finished (ms) 253452
    Last worker finished after 253453 total ms.


1 worker with 2 `QVector`s:
    Worker 1 finished (ms) 1871
    Last worker finished after 1872 total ms.

5 workers with 2 `QVector`s:
    Worker 1 finished (ms) 36492
    Worker 3 finished (ms) 58157
    Worker 5 finished (ms) 79132
    Worker 2 finished (ms) 84612
    Worker 4 finished (ms) 84819
    Last worker finished after 84820 total ms.

10 workers with 2 `QVector`s:
    Worker 7 finished (ms) 234770
    Worker 8 finished (ms) 247531
    Worker 9 finished (ms) 261346
    Worker 1 finished (ms) 261924
    Worker 4 finished (ms) 270520
    Worker 2 finished (ms) 275740
    Worker 10 finished (ms) 290605
    Worker 3 finished (ms) 293575
    Worker 6 finished (ms) 296074
    Worker 5 finished (ms) 296249
    Last worker finished after 296361 total ms.

在 5 到 10 个线程之间的某个时间点,甚至 std::vector 似乎也开始“绊倒自己”。这在 GUI 响应方面也很明显(5 时有点响应,10 时几乎没有)。

正如 OP 的 cmets 中所述,延迟发生在大向量 traces1traces2 的解除分配期间,显然不是在 clear() 期间(或 swap() 期间)。但确定这一点的唯一方法是使用调试器,因为一旦它到达 clearVector() 函数的末尾,线程就会基本上挂起(尝试使用计时器对此进行时间戳是无用的)。

我还尝试在 Worker 中仅使用 1 个矢量“集”(参见代码)。巨大的差异:

10 workers with 1 `std::vector`:
    Worker 5 finished (ms) 4125
    Worker 4 finished (ms) 4139
    Worker 1 finished (ms) 4141
    Worker 6 finished (ms) 4153
    Worker 10 finished (ms) 4161
    Worker 9 finished (ms) 4177
    Worker 7 finished (ms) 4197
    Worker 3 finished (ms) 4216
    Worker 8 finished (ms) 4209
    Worker 2 finished (ms) 4221
    Last worker finished after 4222 total ms.

10 workers with 1 `QVector`:
    Worker 10 finished (ms) 4308
    Worker 2 finished (ms) 4358
    Worker 1 finished (ms) 4373
    Worker 3 finished (ms) 4385
    Worker 8 finished (ms) 4391
    Worker 4 finished (ms) 4400
    Worker 6 finished (ms) 4404
    Worker 7 finished (ms) 4401
    Worker 5 finished (ms) 4409
    Worker 9 finished (ms) 4406
    Last worker finished after 4410 total ms.

这是我的测试“装备”:


#include <QRunnable>
#include <QThread>
#include <QElapsedTimer>
#include <QtWidgets>

#define USE_QVECTOR 0
#define NUM_VECTORS 2
#define USE_CLEAR   0
#define USE_SWAP    0

class Worker : public QObject, public QRunnable
{
    Q_OBJECT
  public:
#if USE_QVECTOR
    typedef QVector<int> vect_t;
    typedef QVector<vect_t> vectVect_t;
#else
    typedef std::vector<int> vect_t;
    typedef std::vector<vect_t> vectVect_t;
#endif

    explicit Worker(int id, int traces, int samples, QObject *parent = nullptr) :
      QObject(parent), QRunnable(),
      id(id), numTraces(traces), numSamps(samples)
    {}

    void run() override
    {
      qDebug() << "worker starting" << id << numTraces << numSamps << QThread::currentThread();
      emit progressChanged(id, -1);
      tim.start();
      clearVector();
      emit progressChanged(id, tim.elapsed());
    }

  signals:
    void progressChanged(int id, int pos) const;

  private:
    void clearVector()
    {
      vectVect_t traces1, traces2;
      traces1.reserve(numTraces);
      if (NUM_VECTORS > 1)
        traces2.reserve(numTraces);
      float progressWaypoint = 0.01f * numTraces;
      int progressPos = 0;
      for (int i=0; i < numTraces; i++) {
        vect_t trace1, trace2;
        trace1.reserve(numSamps);
        if (NUM_VECTORS > 1)
          trace2.reserve(numSamps);
        for (int j=0; j < numSamps; j++) {
          trace1.push_back(j);
          if (NUM_VECTORS > 1)
            trace2.push_back(numSamps - j);
        }
        traces1.push_back(trace1);
        if (NUM_VECTORS > 1)
          traces2.push_back(trace2);

        if (i + 1 >= round(progressWaypoint * progressPos))
          emit progressChanged(id, progressPos++);
      }
      qDebug() << "Vectors populated in" << tim.elapsed();

      if (USE_CLEAR) {
        // Clearing the vectors slows the process down a bit but its not where the delay is.
        traces1.clear();
        if (NUM_VECTORS > 1)
          traces2.clear();
      }
      if (USE_SWAP) {
        // swap is very fast but it doesn't help overall performance
        vectVect_t blank;
        traces1.swap(blank);
        if (NUM_VECTORS > 1)
          traces2.swap(blank);
      }
    }

    int id, numTraces, numSamps;
    QElapsedTimer tim;
};


int main(int argc, char *argv[]) {
  QApplication a(argc, argv);
  // UI setup
  QDialog d;
  d.setLayout(new QVBoxLayout());
  QPushButton *pbStart = new QPushButton("Start", &d);
  QSpinBox *sbThreads = new QSpinBox(&d);
  sbThreads->setValue(5);
  QSpinBox *sbTraces = new QSpinBox(&d);
  sbTraces->setMaximum(10000);
  sbTraces->setValue(10000);
  QSpinBox *sbSamps = new QSpinBox(&d);
  sbSamps->setMaximum(10000);
  sbSamps->setValue(8680);
  QHBoxLayout *btnLo = new QHBoxLayout();
  btnLo->setSpacing(6);
  btnLo->addWidget(pbStart);
  btnLo->addWidget(new QLabel("Thrds:", &d));
  btnLo->addWidget(sbThreads, 1);
  btnLo->addWidget(new QLabel("Traces:", &d));
  btnLo->addWidget(sbTraces, 1);
  btnLo->addWidget(new QLabel("Samps:", &d));
  btnLo->addWidget(sbSamps, 1);
  d.layout()->addItem(btnLo);
  // Text box for showing results
  QTextEdit *e = new QTextEdit(&d);
  e->setReadOnly(true);
  e->setTextInteractionFlags(Qt::TextBrowserInteraction);
  d.layout()->addWidget(e);

  QElapsedTimer tim;  // total elapsed timer
  QVector<int> finished;  // keep track of finished workers

  // Set up workers on button click.
  QObject::connect(pbStart, &QPushButton::clicked, &d, [&]()
  {
    const int threads = sbThreads->value(),
        traces = sbTraces->value(),
        samples = sbSamps->value();

    QThreadPool *pool = QThreadPool::globalInstance();
    //pool->setStackSize(samples * 4 * traces * threads);
    qDebug() << "Pool max. threads:" << pool->maxThreadCount() << "Stack size:" << pool->stackSize();
    pbStart->setDisabled(true);
    finished.clear();
    tim.start();

    for (int i=0; i < threads; i++) {
      Worker *w = new Worker(i+1, traces, samples);

      // Show messages on worker progress updates
      QObject::connect(w, &Worker::progressChanged, &d, [e, pbStart, threads, &tim, &finished](int id, int pos)
      {
        const QString msg = QStringLiteral("Worker %1 %2 %3")
            .arg(id)
            .arg(pos < 0 ? "started" : pos > 100 ? "finished (ms)" : "progress")
            .arg(pos);
        e->append(msg);
        if (pos > 100) {
          finished << id;
          if (finished.count() == threads) {
            e->append(QStringLiteral("Last worker finished after %1 total ms.").arg(tim.elapsed()));
            pbStart->setEnabled(true);
          }
        }
        e->ensureCursorVisible();
      }, Qt::QueuedConnection);

      w->setAutoDelete(true);
      pool->start(w);
      qDebug() << "Queued worker" << i+1 << "with active thread count:" << pool->activeThreadCount();
    }
  });

  d.show();
  return a.exec();
}

#include "main.moc"


添加:使用固定大小的数组而不是向量。显然,在实际代码中,需要注意确保数组索引实际上是有效的。 (当然也可以在内部循环中直接填充traces1traces2 数组,而无需中间的trace1/2,但现在是NVM。:)

    void clearVector()
    {
      float progressWaypoint = 0.01f * numTraces;
      int progressPos = 0;
      // volatile to help make sure the compiler isn't just optimizing these out.
      volatile int *traces1[10000], *traces2[10000];
      for (int i=0; i < numTraces; i++) {
        volatile int trace1[10000], trace2[10000];
        for (int j=0; j < numSamps; j++) {
          trace1[j] = j;
          trace2[j] = (numSamps - j);
        }
        traces1[i] = trace1;
        traces2[i] = trace2;
        if (i + 1 >= round(progressWaypoint * progressPos))
          emit progressChanged(id, progressPos++);
      }
      // also use a value from the populated arrays to make sure they really exist.
      qDebug() << "Vectors populated in" << tim.elapsed() << traces1[0][0] << traces2[5][5];
    }

我必须将100 添加到计时器编号,因为每个线程在

    void run() override {
      ...
      clearVector();
      emit progressChanged(id, tim.elapsed() + 100);
    }

我得到 20 个线程(16 个立即线程和 4 个排队)和 10K 每个“跟踪”和“样本”:

最后一个工人在总共 332 毫秒后完成。

这在我的具有 20 个线程的 32 位 MinGW 构建上也没有问题。相同的执行时间。

【讨论】:

  • 感谢您的分析。从那里开始,看起来瓶颈是在取消分配向量时发生的。您认为 qt 的工作方式或与操作系统相关吗?
  • @zufryy 也许我遗漏了一些东西(比如实际使用填充数组中的值),但是使用固定大小的数组而不是向量,它几乎是即时的。就像 30-100 毫秒“即时”一样,即使有 20 个线程请求(其中一些被排队)。我在我的答案中添加了一个更新的clearVectors() 以表明我的意思。
  • 只是想知道,当您声明 volatile int *traces1[10000], *traces2[10000]; 时,我们是否需要删除该指针以使其不会有任何内存泄漏?
猜你喜欢
  • 1970-01-01
  • 2016-11-25
  • 1970-01-01
  • 2012-03-25
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-03-02
  • 1970-01-01
相关资源
最近更新 更多