【问题标题】:perf stats on AMD 15hAMD 15h 的性能统计
【发布时间】:2015-08-21 07:13:05
【问题描述】:

根据 AMD 15h 的BKDG(第 588 页),可以禁用 通过设置 MSRC001_1022 的一些位来实现硬件预取器

MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits    -->  Description
63:16   -->  Reserved.
15      -->  DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14      -->  Reserved.
13      -->  DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher. 
12:10   -->  Reserved.
9:5     -->  Reserved.
4       -->  DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads. 
3:0     -->  Reserved.

为了禁用所有预取配置,我必须将 0xA008 写入 那个MSR。我为所有 32 个核心使用

[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...

但是,当我与命令一起运行 perf 时,预取统计信息 非零!

[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
 Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
    55,341,597,193 L1-dcache-loads:uk
     1,047,662,614 L1-dcache-prefetches:uk
                 0 L1-dcache-prefetch-misses:uk
      35.921618464 seconds time elapsed

我希望在 L1-dcache-prefetches 前面看到 0。不是吗?

如何调试计数器以了解它们是如何映射到 MSR 的?

【问题讨论】:

  • 使用-v-vvv options of perf record 调试计数器,其中一些会打印出perf_event_open 调用中使用的所有参数。它们可能仍然是合成的,因此请检查 perf_events 的内核部分(您的内核版本是什么?) - 它们位于 arch/x86/events/amd/core.c:L1D OP_PREFETCH RESULT_ACCESS = 0x0267, /* Data Prefetcher :attempts */ 和 l1-dcache 负载为 `0x0040, /* 数据缓存访问 */`

标签: linux performance perf amd-processor


【解决方案1】:

硬件计数器的合成性能名称映射(由perf list 列出)在许多 CPU 的perf_events 子系统的内核源代码中定义。对于 AMD,它们位于 arch/x86/events/amd/core.c 文件中。在 4.8 版本的内核和 AMD 中,cpu 缓存事件被映射到 cpu 特定的常量,以写入 PMC MSR,如下所示:

http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c

static __initconst const u64 amd_hw_cache_event_ids
 ... =  {
 [ C(L1D) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
        [ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts  */
        [ C(RESULT_MISS)   ] = 0x0167, /* Data Prefetcher :cancelled */
    },
 },
 [ C(L1I ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches  */
        [ C(RESULT_MISS)   ] = 0x0081, /* Instruction cache misses   */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = -1,
        [ C(RESULT_MISS)   ] = -1,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
        [ C(RESULT_MISS)   ] = 0,
    },
 },
 [ C(LL  ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
        [ C(RESULT_MISS)   ] = 0x037E, /* L2 Cache Misses : IC+DC     */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback           */
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
 },

...
__init int amd_pmu_init(void)
{ ...
    /* Performance-monitoring supported from K7 and later: */
    if (boot_cpu_data.x86 < 6)
        return -ENODEV;

    x86_pmu = amd_pmu;

    ret = amd_core_pmu_init();
    ...

    /* Events are common for all AMDs */
    memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
           sizeof(hw_cache_event_ids));
    return 0;
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-05-11
    • 2018-02-16
    • 2012-07-31
    • 2018-08-19
    • 1970-01-01
    • 2018-03-09
    • 2012-04-08
    • 2020-03-23
    相关资源
    最近更新 更多