【发布时间】:2021-06-01 03:48:27
【问题描述】:
我有一个Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) 处理器。 AFAIK,mem_load_uops_retired.l3_miss,计算DRAM demand(即non-prefetch)数据读取访问次数。顾名思义,offcore_response.demand_data_rd.l3_miss.local_dram 计算针对 DRAM 的 demand 数据读取次数。因此,这两个事件似乎等价(或至少几乎相同)。但根据以下基准,前一个事件频率远低于后者:
1) 在 C 的循环中初始化 1000 元素全局数组:
Performance counter stats for '/home/ahmad/Simple Progs/loop':
1,363 mem_load_uops_retired.l3_miss
1,543 offcore_response.demand_data_rd.l3_miss.local_dram
0.000749574 seconds time elapsed
0.000778000 seconds user
0.000000000 seconds sys
2) 在 Evince 中打开 PDF 文档:
Performance counter stats for '/opt/evince-3.28.4/bin/evince':
936,152 mem_load_uops_retired.l3_miss
1,853,998 offcore_response.demand_data_rd.l3_miss.local_dram
4.346408203 seconds time elapsed
1.644826000 seconds user
0.103411000 seconds sys
3) 运行 Wireshark 5 秒:
Performance counter stats for 'wireshark':
5,161,671 mem_load_uops_retired.l3_miss
8,126,526 offcore_response.demand_data_rd.l3_miss.local_dram
15.713828395 seconds time elapsed
0.904280000 seconds user
0.693906000 seconds sys
4) 在 Inkscape 中对图像运行模糊滤镜:
Performance counter stats for 'inkscape':
13,852,121 mem_load_uops_retired.l3_miss
23,475,970 offcore_response.demand_data_rd.l3_miss.local_dram
25.355643897 seconds time elapsed
7.244404000 seconds user
1.019895000 seconds sys
在所有四个基准测试中,offcore_response.demand_data_rd.l3_miss.local_dram 的频率几乎是mem_load_uops_retired.l3_miss 的两倍。这合理吗?为什么?请告诉我基准测试是否过于复杂和粗粒度!
【问题讨论】:
标签: intel performancecounter perf memory-access intel-pmu