如何将 QueryThreadCycleTime() 转换为秒？答案

【问题标题】：How to convert QueryThreadCycleTime() to seconds?如何将 QueryThreadCycleTime() 转换为秒？
【发布时间】：2019-08-24 00:29:19
【问题描述】：

Windows 函数QueryThreadCycleTime() 给出了给定线程使用的“CPU 时钟周期”数。 Windows 手册大胆指出

不要尝试将 QueryThreadCycleTime 返回的 CPU 时钟周期转换为经过的时间。

我想对大多数 Intel 和 AMD x86_64 CPU 执行此操作。它不需要非常准确，因为无论如何你不能期望像RDTSC 这样的循环计数器完美。我只需要一些笨拙的方法来获取 CPU 的时间因子 seconds / QueryThreadCycleTime。

首先，我想QueryThreadCycleTime 在内部使用 RDTSC。我想在某些 CPU 上使用恒定速率 TSC，因此更改实际时钟速率（例如，使用可变频率 CPU 电源管理）不会影响 time/TSC 因素。在其他 CPU 上，该速率可能会发生变化，因此我必须定期查询该因素。

我为什么需要这个？

在有人引用XY Problem 之前，我应该指出我对替代解决方案并不真正感兴趣。这是因为我有两个其他方法无法满足的分析硬性要求。

它应该只测量线程时间，所以sleep(1) 应该不返回 1 秒，而是持续 1 秒的繁忙循环应该。换句话说，分析器不应该说一个任务运行了 10 毫秒，而它的线程只活动了 1 毫秒。这就是我不能使用QueryPerformanceCounter() 的原因。
需要一个优于1/64秒的精度，这是GetThreadTimes()给出的精度。我正在分析的任务可能只运行几微秒。

最小的可重现示例

按照@Ted Lyngmo 的要求，目标是实现computeFactor()。

#include <stdio.h>
#include <windows.h>

double computeFactor();

int main() {
    uint64_t start, end;
    QueryThreadCycleTime(GetCurrentThread(), &start);
    // insert task here, such as an actual workload or sleep(1)
    QueryThreadCycleTime(GetCurrentThread(), &end);
    printf("%lf\n", (end - start) * computeFactor());
    return 0;
}

【问题讨论】：

“对替代解决方案并不真正感兴趣”到什么解决方案？您当前的非工作解决方案（在代码中）是什么？
我认为这与问题无关，但github.com/VCVRack/Rack/blob/v1/src/system.cpp#L190-L194是当前的测量方法，github.com/VCVRack/Rack/blob/v1/src/engine/Engine.cpp#L258-L274是如何使用计时。最终用户结果为cdn.discordapp.com/attachments/199190471258537985/…
我之所以要求您提供当前无效的解决方案，是为了让您发布它以触发人们给出答案。我个人认为它是相关的，没有它就不会尝试回答。
很公平。您可以使用 VCV Rack 1.1.4 (vcvrack.com/Rack.html) 的 Windows 版本并通过启用“引擎 > CPU 计量器”在生产中试用结果。
当 API 被记录为不适合特定用途，并且该 API 的发布者对该主题非常了解（MS 不是 joeblow@mymomsbasement.com）时，您应该接受作为准确的信息。期望他们知道的比你少，你无论如何都能让它工作，这有点不合理，期望我们为你做这项工作更不合理。你的要求是行不通的，与其浪费时间尝试去做，你应该改变你的立场，寻求替代解决方案。

标签： c++ windows winapi

【解决方案1】：

不要尝试将 QueryThreadCycleTime 返回的 CPU 时钟周期转换为经过的时间。

我也想这样做。

你的愿望显然被拒绝！

一种解决方法，可以做一些接近你想要的事情，可能是创建一个带有steady_clock 的线程，该线程以某个指定的频率对QueryThreadCycleTime 和/或GetThreadTimes 进行采样。这是一个示例，说明如何使用采样线程每秒对两者进行一次采样。

#include <algorithm>
#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <iomanip>
#include <thread>
#include <vector>

#include <Windows.h>

using namespace std::literals::chrono_literals;

struct FTs_t {
    FILETIME CreationTime, ExitTime, KernelTime, UserTime;
    ULONG64 CycleTime;
};

using Sample = std::vector<FTs_t>;

std::ostream& operator<<(std::ostream& os, const FILETIME& ft) {
    std::uint64_t bft = (std::uint64_t(ft.dwHighDateTime) << 16) + ft.dwLowDateTime;
    return os << bft;
}

std::ostream& operator<<(std::ostream& os, const Sample& smp) {
    size_t tno = 0;
    for (const auto& fts : smp) {
        os << " tno:" << std::setw(3) << tno << std::setw(10) << fts.KernelTime
           << std::setw(10) << fts.UserTime << std::setw(16) << fts.CycleTime << "\n";
        ++tno;
    }
    return os;
}

// the sampling thread
void ft_sampler(std::atomic<bool>& quit, std::vector<std::thread>& threads, std::vector<Sample>& samples) {
    auto tp = std::chrono::steady_clock::now(); // for steady sampling

    FTs_t fts;
    while (quit == false) {
        Sample s;
        s.reserve(threads.size());
        for (auto& th : threads) {
            if (QueryThreadCycleTime(th.native_handle(), &fts.CycleTime) &&
                GetThreadTimes(th.native_handle(), &fts.CreationTime,
                               &fts.ExitTime, &fts.KernelTime, &fts.UserTime)) {
                s.push_back(fts);
            }
        }
        samples.emplace_back(std::move(s));

        tp += 1s; // add a second since we last sampled and sleep until that time_point
        std::this_thread::sleep_until(tp);
    }
}

// a worker thread
void worker(std::atomic <bool>& quit, size_t payload) {
    volatile std::uintmax_t x = 0;
    while (quit == false) {
        for (size_t i = 0; i < payload; ++i) ++x;
        std::this_thread::sleep_for(1us);
    }
}

int main() {
    std::atomic<bool> quit_sampling = false, quit_working = false;
    std::vector<std::thread> threads;
    std::vector<Sample> samples;
    size_t max_threads = std::thread::hardware_concurrency() > 1 ? std::thread::hardware_concurrency() - 1 : 1;

    // start some worker threads
    for (size_t tno = 0; tno < max_threads; ++tno) {
        threads.emplace_back(std::thread(&worker, std::ref(quit_working), (tno + 100) * 100000));
    }

    // start the sampling thread
    auto smplr = std::thread(&ft_sampler, std::ref(quit_sampling), std::ref(threads), std::ref(samples));

    // let the threads work for some time
    std::this_thread::sleep_for(10s);

    quit_sampling = true;
    smplr.join();

    quit_working = true;
    for (auto& th : threads) th.join();

    std::cout << "Took " << samples.size() << " samples\n";

    size_t s = 0;
    for (const auto& smp : samples) {
        std::cout << "Sample " << s << ":\n" << smp << "\n";
        ++s;
    }
}

【讨论】：

这很可能是我要做的。换句话说，计算极限steady_clock / QueryThreadCycleTime() 作为时间-> 无穷大的近似值。我想不通的是如何保证计算这个近似值的线程不会被操作系统进行上下文切换，从而弄乱steady_clock 测量。我可以多次计算并取最小值。
如果你想要保证，你需要一个实时操作系统。作为观察者，您可以做的（没有 RTOS）是使用steady_clock 来补偿。那部分不会太糟糕。
嗯，现在想想，我可以使用 Windows 的GetThreadTimes() 来进行测量。它与QueryThreadCycleTime() 在相同的“时间参考框架”中运行，与steady_clock 不同，所以前两个测量值之间的比率是我所追求的因素！我可以启动一个忙等待大约 1 秒的线程并测量 GetThreadTimes() 和 QueryThreadCycleTime()，将它们分开，这就是因素！
这听起来像是一个更好的选择，但我建议使用steady_clock 或high_resolution_clock 和std::this_thread::sleep_until(<time_point>) 来进行实际采样。
我不确定这将如何工作，因为如果线程简单地作为其“工作负载”产生，GetThreadTimes() 和 QueryThreadCycleTime() 都将返回 ~0。这些函数的重点是当线程处于非活动状态时不计数。