python while 循环的高 CPU 使用率：即使 97% 的时间都在睡觉。为什么？答案

【问题标题】：High CPU usage for a python while loop: even when sleeping 97% of the time. Why?python while 循环的高 CPU 使用率：即使 97% 的时间都在睡觉。为什么？
【发布时间】：2017-06-16 05:54:00
【问题描述】：

我试图找出为什么以下非常简单、最小的示例在我的 i7-5500U CPU、Windows 10 计算机上占用了大约 33% 的 CPU 使用率：

import time
import numpy as np
import scipy.linalg
import cProfile

class CPUTest:
    def __init__(self):
        self.running = True

    def compute_stuff(self):
        dims = 150
        A = np.random.random((dims, dims))
        B = scipy.linalg.inv(np.dot(A.T, A))

    def run(self):
        prev_time = time.time()
        start_time = prev_time
        while self.running:
            time.sleep(0.3)
            st = time.time()
            self.compute_stuff()
            et = time.time()
            print 'Time for the whole iteration, inc. sleep: %.3f (ms), whereas the processing segment took %.3f (ms): ' % ((st - prev_time) * 1000, (et - st) * 1000)
            prev_time = st
            if st - start_time > 10.0:
                break

t = CPUTest()
t.run()
# cProfile.run('t.run()')

compute_stuff 函数只需要 2ms，其余时间程序都在休眠。由于 sleep 不应该使用 CPU，因此该程序理论上应该只占用 0.6% 的 CPU 使用率，但目前占用了大约 30%。

我尝试了一个分析器，它确认程序在 10 秒中有 9.79 秒处于睡眠状态。

有人可以提供一个提示，说明为什么 python 会这样吗？有什么办法可以减少 CPU 使用率。

非常感谢！

编辑

总之，程序在 >97% 的时间内处于休眠状态，而我的 CPU 使用率仍为 33%。我想在不牺牲计算频率的情况下降低 CPU 使用率。

您可以在此处找到程序输出的示例：

Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 1.000 (ms):
Time for the whole iteration, inc. sleep: 301.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 3.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 1.000 (ms):
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 1.000 (ms):
Time for the whole iteration, inc. sleep: 301.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 1.000 (ms):

这是分析器的输出：

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000   10.050   10.050 <string>:1(<module>)
    1    0.019    0.019    0.021    0.021 __init__.py:133(<module>)
    1    0.067    0.067    0.119    0.119 __init__.py:205(<module>)
    1    0.000    0.000    0.000    0.000 _components.py:1(<module>)
    1    0.000    0.000    0.000    0.000 _laplacian.py:3(<module>)
   49    0.000    0.000    0.000    0.000 _methods.py:37(_any)
   49    0.000    0.000    0.001    0.000 _methods.py:40(_all)
   49    0.011    0.000    0.137    0.003 _util.py:141(_asarray_validated)
    1    0.001    0.001    0.001    0.001 _validation.py:1(<module>)
    1    0.000    0.000    0.000    0.000 _version.py:114(_compare)
    1    0.000    0.000    0.000    0.000 _version.py:148(__gt__)
    2    0.000    0.000    0.000    0.000 _version.py:55(__init__)
    1    0.000    0.000    0.000    0.000 _version.py:78(_compare_version)
    1    0.008    0.008    0.009    0.009 base.py:1(<module>)
    1    0.000    0.000    0.000    0.000 base.py:15(SparseWarning)
    1    0.000    0.000    0.000    0.000 base.py:19(SparseFormatWarning)
    1    0.000    0.000    0.000    0.000 base.py:23(SparseEfficiencyWarning)
    1    0.000    0.000    0.000    0.000 base.py:61(spmatrix)
   49    0.000    0.000    0.000    0.000 base.py:887(isspmatrix)
   49    0.043    0.001    0.185    0.004 basic.py:619(inv)
   49    0.000    0.000    0.001    0.000 blas.py:177(find_best_blas_type)
   49    0.001    0.000    0.002    0.000 blas.py:223(_get_funcs)
    1    0.000    0.000    0.000    0.000 bsr.py:1(<module>)
    1    0.000    0.000    0.000    0.000 bsr.py:22(bsr_matrix)
    1    0.012    0.012    0.012    0.012 compressed.py:1(<module>)
    1    0.000    0.000    0.000    0.000 compressed.py:21(_cs_matrix)
    1    0.000    0.000    0.000    0.000 construct.py:2(<module>)
    1    0.000    0.000    0.000    0.000 coo.py:1(<module>)
    1    0.000    0.000    0.000    0.000 coo.py:21(coo_matrix)
   49    0.000    0.000    0.000    0.000 core.py:5960(isMaskedArray)
   49    0.001    0.000    0.242    0.005 cpuTests.py:10(compute_stuff)
    1    0.013    0.013   10.050   10.050 cpuTests.py:15(run)
    1    0.000    0.000    0.000    0.000 csc.py:1(<module>)
    1    0.000    0.000    0.000    0.000 csc.py:19(csc_matrix)
    1    0.008    0.008    0.020    0.020 csr.py:1(<module>)
    1    0.000    0.000    0.000    0.000 csr.py:21(csr_matrix)
   18    0.000    0.000    0.000    0.000 data.py:106(_create_method)
    1    0.000    0.000    0.000    0.000 data.py:121(_minmax_mixin)
    1    0.000    0.000    0.000    0.000 data.py:22(_data_matrix)
    1    0.000    0.000    0.000    0.000 data.py:7(<module>)
    1    0.000    0.000    0.000    0.000 dia.py:1(<module>)
    1    0.000    0.000    0.000    0.000 dia.py:17(dia_matrix)
    1    0.000    0.000    0.000    0.000 dok.py:1(<module>)
    1    0.000    0.000    0.000    0.000 dok.py:29(dok_matrix)
    1    0.000    0.000    0.000    0.000 extract.py:2(<module>)
   49    0.000    0.000    0.001    0.000 fromnumeric.py:1887(any)
   49    0.005    0.000    0.006    0.000 function_base.py:605(asarray_chkfinite)
   49    0.000    0.000    0.000    0.000 getlimits.py:245(__init__)
   49    0.000    0.000    0.000    0.000 getlimits.py:270(max)
   49    0.000    0.000    0.002    0.000 lapack.py:405(get_lapack_funcs)
   49    0.002    0.000    0.003    0.000 lapack.py:447(_compute_lwork)
    1    0.000    0.000    0.000    0.000 lil.py:19(lil_matrix)
    1    0.002    0.002    0.002    0.002 lil.py:2(<module>)
   49    0.000    0.000    0.000    0.000 misc.py:169(_datacopied)
    3    0.000    0.000    0.000    0.000 nosetester.py:181(__init__)
    3    0.000    0.000    0.000    0.000 ntpath.py:174(split)
    3    0.000    0.000    0.000    0.000 ntpath.py:213(dirname)
    3    0.000    0.000    0.000    0.000 ntpath.py:96(splitdrive)
   49    0.000    0.000    0.000    0.000 numeric.py:406(asarray)
   49    0.000    0.000    0.000    0.000 numeric.py:476(asanyarray)
   98    0.000    0.000    0.000    0.000 numerictypes.py:942(_can_coerce_all)
   49    0.000    0.000    0.000    0.000 numerictypes.py:964(find_common_type)
    5    0.000    0.000    0.000    0.000 re.py:138(match)
    2    0.000    0.000    0.000    0.000 re.py:143(search)
    7    0.000    0.000    0.000    0.000 re.py:230(_compile)
    1    0.000    0.000    0.000    0.000 sputils.py:2(<module>)
    1    0.000    0.000    0.000    0.000 sputils.py:227(IndexMixin)
    3    0.000    0.000    0.000    0.000 sre_compile.py:228(_compile_charset)
    3    0.000    0.000    0.000    0.000 sre_compile.py:256(_optimize_charset)
    3    0.000    0.000    0.000    0.000 sre_compile.py:433(_compile_info)
    6    0.000    0.000    0.000    0.000 sre_compile.py:546(isstring)
    3    0.000    0.000    0.000    0.000 sre_compile.py:552(_code)
    3    0.000    0.000    0.000    0.000 sre_compile.py:567(compile)
    3    0.000    0.000    0.000    0.000 sre_compile.py:64(_compile)
    7    0.000    0.000    0.000    0.000 sre_parse.py:149(append)
    3    0.000    0.000    0.000    0.000 sre_parse.py:151(getwidth)
    3    0.000    0.000    0.000    0.000 sre_parse.py:189(__init__)
   16    0.000    0.000    0.000    0.000 sre_parse.py:193(__next)
    3    0.000    0.000    0.000    0.000 sre_parse.py:206(match)
   13    0.000    0.000    0.000    0.000 sre_parse.py:212(get)
    3    0.000    0.000    0.000    0.000 sre_parse.py:268(_escape)
    3    0.000    0.000    0.000    0.000 sre_parse.py:317(_parse_sub)
    3    0.000    0.000    0.000    0.000 sre_parse.py:395(_parse)
    3    0.000    0.000    0.000    0.000 sre_parse.py:67(__init__)
    3    0.000    0.000    0.000    0.000 sre_parse.py:706(parse)
    3    0.000    0.000    0.000    0.000 sre_parse.py:92(__init__)
    1    0.000    0.000    0.000    0.000 utils.py:117(deprecate)
    1    0.000    0.000    0.000    0.000 utils.py:51(_set_function_name)
    1    0.000    0.000    0.000    0.000 utils.py:68(__init__)
    1    0.000    0.000    0.000    0.000 utils.py:73(__call__)
    3    0.000    0.000    0.000    0.000 {_sre.compile}
    1    0.000    0.000    0.000    0.000 {dir}
  343    0.000    0.000    0.000    0.000 {getattr}
    3    0.000    0.000    0.000    0.000 {hasattr}
  158    0.000    0.000    0.000    0.000 {isinstance}
  270    0.000    0.000    0.000    0.000 {len}
   49    0.000    0.000    0.001    0.000 {method 'all' of 'numpy.ndarray' objects}
   49    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.ndarray' objects}
  211    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
   49    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    5    0.000    0.000    0.000    0.000 {method 'end' of '_sre.SRE_Match' objects}
    6    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
    3    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
  205    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
    2    0.000    0.000    0.000    0.000 {method 'group' of '_sre.SRE_Match' objects}
   49    0.000    0.000    0.000    0.000 {method 'index' of 'list' objects}
    3    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
    1    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
    5    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
   49    0.021    0.000    0.021    0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
   98    0.001    0.000    0.001    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    3    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
    2    0.000    0.000    0.000    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
    2    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
   60    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
    1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
    6    0.000    0.000    0.000    0.000 {min}
  147    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
   49    0.036    0.001    0.036    0.001 {numpy.core.multiarray.dot}
    4    0.000    0.000    0.000    0.000 {ord}
   18    0.000    0.000    0.000    0.000 {setattr}
    3    0.000    0.000    0.000    0.000 {sys._getframe}
   49    9.794    0.200    9.794    0.200 {time.sleep}
   99    0.000    0.000    0.000    0.000 {time.time}

二次编辑

我已经实现了等效的 C++ 版本（如下）。 C++ 版本确实有我所期望的行为：它只使用了 0.3% 到 0.5% 的 CPU 使用率！

#include <iostream>
#include <chrono>
#include <random>
#include <thread>

// Tune this values to get a computation lasting from 2 to 10ms
#define DIMS 50
#define MULTS 20

/*
This function will compute MULTS times matrix multiplications of transposed(A)*A

We simply want to waste enough time doing computations (tuned to waste between 2ms and 10ms)
*/
double compute_stuff(double A[][DIMS], double B[][DIMS]) {
    double res = 0.0;
    for (int k = 0; k < MULTS; k++) {
        for (int i = 0; i < DIMS; i++) {
            for (int j = 0; j < DIMS; j++) {
                B[i][j] = 0.0;
                for (int l = 0; l < DIMS; l++) {
                    B[i][j] += A[l][j] * A[j][l];
                }
            }
        }
        // We store the result from the matrix B
        for (int i = 0; i < DIMS; i++) {
            for (int j = 0; j < DIMS; j++) {
                A[i][j] = B[i][j];
            }
        }
    }
    for (int i = 0; i < DIMS; i++) {
        for (int j = 0; j < DIMS; j++) {
            res += A[i][j];
        }
    }
    return res;
}


int main() {
    std::cout << "Running main" << std::endl;
    double A[DIMS][DIMS];  // Data buffer for a random matrix
    double B[DIMS][DIMS];  // Data buffer for intermediate computations
    std::default_random_engine generator;
    std::normal_distribution<double> distribution(0.0, 1.0);
    for (int i = 0; i < DIMS; i++) {
        for (int j = 0; j < DIMS; j++) {
            A[i][j] = distribution(generator);
        }
    }
    bool keep_running = true;
    auto prev_time = std::chrono::high_resolution_clock::now();
    auto start_time = prev_time;
    while (keep_running)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(300));
        auto st = std::chrono::high_resolution_clock::now();
        double res = compute_stuff(A, B);
        auto et = std::chrono::high_resolution_clock::now();
        auto iteration_time = std::chrono::duration_cast<std::chrono::milliseconds>(st - prev_time).count();
        auto computation_time = std::chrono::duration_cast<std::chrono::milliseconds>(et - st).count();
        auto elapsed_time = std::chrono::duration_cast<std::chrono::milliseconds>(et - start_time).count();
        std::cout << "Time for the whole iteration, inc. sleep:" << iteration_time << " (ms), whereas the processing segment took " << computation_time  << "(ms)" << std::endl;
        keep_running = elapsed_time < 10 * 1000;
        prev_time = st;
    }
}

在这里您还可以看到 C++ 等效程序的输出：

Time for the whole iteration, inc. sleep:314 (ms), whereas the processing segment took 7(ms)
Time for the whole iteration, inc. sleep:317 (ms), whereas the processing segment took 7(ms)
Time for the whole iteration, inc. sleep:316 (ms), whereas the processing segment took 8(ms)
Time for the whole iteration, inc. sleep:316 (ms), whereas the processing segment took 7(ms)
Time for the whole iteration, inc. sleep:314 (ms), whereas the processing segment took 10(ms)

似乎发生了一些特定于 python 的事情。相同的行为已在 3 台机器（linux 和 Windows）中得到确认

【问题讨论】：

如果您将sleep 时间增加到 3 秒，只是为了看看 CPU 是否真的保持高位会发生什么？
那么你的 CPU 在程序休眠期间运行的速度有多快？它很可能会被停止，这意味着程序仍然支配运行时间。
3 秒 sleep CPU 使用率为 1.3%。 1秒sleep是4.5%，0.5秒sleep是9%，0.1秒已经是27%。
@kennethfm 我想现在你有你需要的提示了吗？
@YannVernier ，我真的不知道 CPU 的运行速度有多快，因为它每秒会休眠/唤醒几次。我正在从每秒更新一次左右的任务管理器进行监控。

标签： python while-loop cpu sleep

【解决方案1】：

我在为游戏编写程序时发现了这个问题。

我意识到，即使我创建了一个只打印一个 hello world msg 的 while 无限循环，我的程序的 cpu 使用率仍然是 30%。

所以我在我的 while 循环开始和结束时使用 time.sleep(0.05)。

我的问题解决了。只是在你的循环中玩睡眠。我觉得可以。

【讨论】：

【解决方案2】：

我认为您正在测量不同的事物，这会引起一些混乱。

对于初学者来说，切换上下文的成本；如果您有批处理作业，最好让系统决定何时切换到其他任务，而不是自己插入睡眠。每次您的进程休眠时，它都会花一些时间调用系统以重新安排并设置警报以再次唤醒，然后在警报触发后恢复。

传统上，任务管理器使用的 CPU 使用指示也不精确。它们旨在找出使系统保持忙碌的程序，并指示调度程序正在处理什么。例如，一个常见的迹象是系统空闲进程需要大量时间；该过程只是为了保持一致性，因此在无事可做时进入睡眠状态并不是调度程序的特殊情况。

CPU 速度本身现在是可变的。如果您的程序经常很少睡眠，许多计算机会减慢速度以匹配它，旨在使播放视频等工作的功能不需要在运行和睡眠模式之间切换，这本身需要一些时间。特别是，一旦进入睡眠状态，重新启动需要时间，这会使基于时间的调度（睡眠和超时）复杂化并延迟反应。这意味着 CPU 百分比仅在高度相似的负载下与另一个可比较。

您的系统可能有几个其他任务在后台运行，这些任务很少需要 CPU 时间。当睡眠时间很短时，它们可能会插入同一个处理器，但如果该任务睡眠时间较长，则更经常在另一个处理器上运行。由于您的程序只需要一个处理器容量的一小部分，这使得百分比变化很大。

我们看到的另一个方面是时间测量仅以毫秒为单位。工作片表示为 1 到 3 毫秒，我们有一个非常大的相对量化误差。切片太小，无法使用此系统上的任务管理器或 time.time() 可靠地测量。

考虑到所有这些额外的变量，我们真正知道的是，您执行的睡眠次数越多，程序的开销就越大。像 unix time(1) 这样的工具将指示特定任务的分布，通过划分花费在墙上的时间（实际经过的时间）、用户（运行程序本身所花费的时间）和系统（处理程序调用所花费的时间，包括管理睡眠开销，但不是实际睡眠的时间）。

这些睡眠的目标是什么？通过设置线程优先级不是更好吗？

【讨论】：

感谢您冗长而详细的回答！此代码是视频处理应用程序的最小示例，compute_stuff 表示：process_current_frame，因此每秒被调用 30 次。该应用程序至关重要，应具有接近于零的延迟。因此打算有相当高的优先级。与您提出的其他观点相关： - 上下文切换可能代价高昂，但每秒仅进行约 3 次。 - 确实，我不知道该信任任务管理器多少。在 Linux 中，我得到了 120%（多于一个内核）。我必须将 dims 减少到 12 才能获得 2ms 的计算。
继续... CPU 变化是一个非常有趣的观点。但是，我担心最终用户可能不会接受看到应用程序一直占用其 CPU 30% 的时间——您确实可以在后台执行其他任务（浏览器、word 等）——虽然量化错误可能是个问题，睡眠与计算的比率仍然在 100 倍左右，因此......量化可能仅在 95 倍到 105 倍之间变化，因此测量虽然不准确，但可能足以得出结论。
您正在尝试将百分比计算为运行时间与墙上时间的百分比。这可能是您的运行时间超过了 CPU 运行时间，完全不考虑睡眠时间。听起来你的负载应该表现得类似于例如Leap Motion 或 OSVR 跟踪进程。
确实我使用的是挂墙时间，但我希望实际运行时间低于挂墙时间，因此在 CPU 使用率方面甚至更低（我的 CPU 使用率为 100*run_time/total_time ）。在我添加的（第二个）编辑中，您可以看到用 C++ 实现相同的框架时会发生什么。 CPU使用率降低了很多！正如预期的那样。那么python中发生了什么？是的，负载的行为应该类似于对传入数据进行持续处理的系统。

【解决方案3】：

我遇到了同样的问题，并设法通过强制 numpy/scipy 仅使用一个带有 BLAS 的线程来解决它。将线程数定义为环境变量或在导入 numpy 和 scipy 之前添加以下行以强制它使用一个线程。

import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
import numpy as np
import scipy.linalg

我不确定根本问题，但有任何延迟，例如从 time.sleep 或阻塞 I/O 操作与多线程 numpy 冲突，导致它浪费 CPU 周期而不是正常睡眠。诸如 np.mean 之类的非多线程操作似乎不会导致此问题。但是，如果没有延迟，多线程 numpy 在计算上仍然会比强制使用一个线程快很多。

【讨论】：

【解决方案4】：

我注意到同样的事情，从 python 调用 time.sleep 会占用更多 CPU。我没有遇到您的巨大性能问题，但由于我希望我的空闲进程处于空闲状态，而不是占用 CPU，我采用了猴子修补 time.sleep 并用对 usleep 的本机调用替换它的解决方案。（我的主要目标是 Linux）

import time
from ctypes import cdll

glibc = None


def _custom_sleep(t):
    glibc.usleep(int(t * 1000000))


def patch_time():
    global glibc
    try:
        glibc = cdll.LoadLibrary("libc.so.6")

        time.sleep = _custom_sleep
    except Exception as e:
        print(f"Failed to patch time.sleep: {e}. Performance might be worse.")

我在主应用程序中调用patch_time。 CPU 使用率下降。

【讨论】：