Ubuntu - 如何判断 CPU 应用程序当前正在使用 AVX 还是 SSE？答案

【问题标题】：Ubuntu - how to tell if AVX or SSE, is current being used by CPU app?Ubuntu - 如何判断 CPU 应用程序当前正在使用 AVX 还是 SSE？
【发布时间】：2024-01-11 21:20:01
【问题描述】：

我目前在许多具有 GPU 的服务器上运行 BOINC。

服务器运行 GPU 和 CPU BOINC 应用程序。

由于 AVX 和 SSE 在 CPU 应用程序中使用时会降低 CPU 频率，因此我必须选择一起运行哪个 CPU/GPU，因为某些 GPU 应用程序会遇到瓶颈（运行时间完成速度较慢），而其他应用程序则这样做不是。

目前，一些 CPU 应用程序已命名，因此可以清楚地看到它们是否使用 AVX，但大多数不是。

因此，我可以运行任何命令并以某种方式查看，以查看当前运行的任何 CPU 应用程序是否使用 AVX 或 SSE（任何版本）？

另外作为旁注，我是否应该以同样的方式处理任何 FMA 使用（例如，它是否会因 CPU 温度升高而减慢 CPU 频率）？

谢谢

【问题讨论】：

由于 AVX 和 SSE 会降低 CPU 频率 对于 SSE 而言并非如此，但热/功率限制除外。偶尔的 SSE 指令永远不会受到伤害（与 Haswell 等 CPU 上偶尔的 256 位 AVX 指令不同）。而且，如果您大量使用 SSE，则时钟速度损失可能不如运行两倍的标量指令那么糟糕。是的，您应该将 FMA 视为任何其他 SIMD FP 指令，就像 vmulps 或 vaddps 一样。不过，128 位 AVX 指令很好；您可以使用gcc -O3 -march=native -mprefer-vector-width=128 安全地编译
SIMD FP 数学有性能计数器；那些需要减少最大涡轮增压的主要事情。见How do I monitor the amount of SIMD instruction usage；这可能是重复的。
请保持重点，不要回答您自己的问题。 CPU 应用程序以 100% 的负载连续运行长达 30 天（例如 CPDN）。我所知道的使用 AVX，例如进行蛋白质折叠的 Rosetta@home，是的，它立即减慢了频率（在它有时间达到任何热限制之前）。但是，这个问题是关于能否查看给定的 CPU 应用程序是否正在使用这些指令集，而不是关于编译应用程序或您对应用程序是否会因温度而降低频率的看法。
是的，就像我说的，256 位 AVX 指令可以立即降低涡轮频率。（峰值功率/电流传输限制也是如此，甚至在温度限制需要降低之前也是如此。）但即便如此，您可能只需要担心 256 位 AVX，而不是 SSE。

标签： gpu sse avx avx2 boinc

【解决方案1】：

您可以使用perf top 查看实时执行的 AVX 和 SSE 指令的数量以及可执行和共享库名称：

perf top -e fp_arith_inst_retired.128b_packed_single -e fp_arith_inst_retired.128b_packed_double -e fp_arith_inst_retired.256b_packed_single -e fp_arith_inst_retired.256b_packed_double

计数器说明（来自 Intel Coffee Lake CPU 上的 perf list 输出）：

floating point:
  fp_arith_inst_retired.128b_packed_double          
       [Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired. Each count represents 2 computations. Applies to SSE* and AVX*
        packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform
        multiple calculations per element]
  fp_arith_inst_retired.128b_packed_single          
       [Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired. Each count represents 4 computations. Applies to SSE* and AVX*
        packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP RSQRT SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they
        perform multiple calculations per element]
  fp_arith_inst_retired.256b_packed_double          
       [Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired. Each count represents 4 computations. Applies to SSE* and AVX*
        packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform
        multiple calculations per element]
  fp_arith_inst_retired.256b_packed_single          
       [Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired. Each count represents 8 computations. Applies to SSE* and AVX*
        packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP RSQRT SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they
        perform multiple calculations per element]
  fp_arith_inst_retired.scalar_double               
       [Number of SSE/AVX computational scalar double precision floating-point instructions retired. Each count represents 1 computation. Applies to SSE* and AVX* scalar double
        precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element]
  fp_arith_inst_retired.scalar_single               
       [Number of SSE/AVX computational scalar single precision floating-point instructions retired. Each count represents 1 computation. Applies to SSE* and AVX* scalar single
        precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP RSQRT SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform multiple calculations
        per element]
  fp_assist.any                                     
       [Cycles with any input/output SSE or FP assist]

【讨论】：

好的，谢谢，这非常有用。感谢您回答我提出的问题。