【发布时间】:2014-11-25 19:43:38
【问题描述】:
我无法理解为什么我对这个 http://www.codeproject.com/Tips/447938/High-performance-Csharp-byte-array-to-hex-string-t 函数的“并发”实现只有约 20% 的性能提升。
为方便起见,这里是该站点的代码:
static readonly int[] toHexTable = new int[] {
3145776, 3211312, 3276848, 3342384, 3407920, 3473456, 3538992, 3604528, 3670064, 3735600,
4259888, 4325424, 4390960, 4456496, 4522032, 4587568, 3145777, 3211313, 3276849, 3342385,
3407921, 3473457, 3538993, 3604529, 3670065, 3735601, 4259889, 4325425, 4390961, 4456497,
4522033, 4587569, 3145778, 3211314, 3276850, 3342386, 3407922, 3473458, 3538994, 3604530,
3670066, 3735602, 4259890, 4325426, 4390962, 4456498, 4522034, 4587570, 3145779, 3211315,
3276851, 3342387, 3407923, 3473459, 3538995, 3604531, 3670067, 3735603, 4259891, 4325427,
4390963, 4456499, 4522035, 4587571, 3145780, 3211316, 3276852, 3342388, 3407924, 3473460,
3538996, 3604532, 3670068, 3735604, 4259892, 4325428, 4390964, 4456500, 4522036, 4587572,
3145781, 3211317, 3276853, 3342389, 3407925, 3473461, 3538997, 3604533, 3670069, 3735605,
4259893, 4325429, 4390965, 4456501, 4522037, 4587573, 3145782, 3211318, 3276854, 3342390,
3407926, 3473462, 3538998, 3604534, 3670070, 3735606, 4259894, 4325430, 4390966, 4456502,
4522038, 4587574, 3145783, 3211319, 3276855, 3342391, 3407927, 3473463, 3538999, 3604535,
3670071, 3735607, 4259895, 4325431, 4390967, 4456503, 4522039, 4587575, 3145784, 3211320,
3276856, 3342392, 3407928, 3473464, 3539000, 3604536, 3670072, 3735608, 4259896, 4325432,
4390968, 4456504, 4522040, 4587576, 3145785, 3211321, 3276857, 3342393, 3407929, 3473465,
3539001, 3604537, 3670073, 3735609, 4259897, 4325433, 4390969, 4456505, 4522041, 4587577,
3145793, 3211329, 3276865, 3342401, 3407937, 3473473, 3539009, 3604545, 3670081, 3735617,
4259905, 4325441, 4390977, 4456513, 4522049, 4587585, 3145794, 3211330, 3276866, 3342402,
3407938, 3473474, 3539010, 3604546, 3670082, 3735618, 4259906, 4325442, 4390978, 4456514,
4522050, 4587586, 3145795, 3211331, 3276867, 3342403, 3407939, 3473475, 3539011, 3604547,
3670083, 3735619, 4259907, 4325443, 4390979, 4456515, 4522051, 4587587, 3145796, 3211332,
3276868, 3342404, 3407940, 3473476, 3539012, 3604548, 3670084, 3735620, 4259908, 4325444,
4390980, 4456516, 4522052, 4587588, 3145797, 3211333, 3276869, 3342405, 3407941, 3473477,
3539013, 3604549, 3670085, 3735621, 4259909, 4325445, 4390981, 4456517, 4522053, 4587589,
3145798, 3211334, 3276870, 3342406, 3407942, 3473478, 3539014, 3604550, 3670086, 3735622,
4259910, 4325446, 4390982, 4456518, 4522054, 4587590
};
public static unsafe string ToHex1(byte[] source)
{
fixed (int* hexRef = toHexTable)
fixed (byte* sourceRef = source)
{
byte* s = sourceRef;
int resultLen = (source.Length << 1);
var result = new string(' ', resultLen);
fixed (char* resultRef = result)
{
int* pair = (int*)resultRef;
while (*pair != 0)
*pair++ = hexRef[*s++];
return result;
}
}
}
这是我的“改进”:
public static unsafe string ToHex1p(byte[] source)
{
var chunks = Environment.ProcessorCount;
var n = (int)Math.Ceiling(source.Length / (double)chunks);
int resultLen = (source.Length << 1);
var result = new string(' ', resultLen);
Parallel.For(0, chunks, k =>
{
var l = Math.Min(source.Length, (k + 1) * n);
fixed (char* resultRef = result) fixed (byte* sourceRef = source)
{
int from = n * k;
int to = (int)resultRef + (l << 2);
int* pair = (int*)resultRef + from;
byte* s = sourceRef + from;
while ((int)pair != to)
*pair++ = toHexTable[*s++];
}
});
return result;
}
编辑 1 这就是我对函数计时的方式:
var n = 0xff;
var s = new System.Diagnostics.Stopwatch();
var d = Enumerable.Repeat<byte>(0xce, (int)Math.Pow(2, 23)).ToArray();
s.Start();
for (var i = 0; i < n; ++i)
{
Binary.ToHex1(d);
}
Console.WriteLine(s.ElapsedMilliseconds / (double)n);
s.Restart();
for (var i = 0; i < n; ++i)
{
Binary.ToHex1p(d);
}
Console.WriteLine(s.ElapsedMilliseconds / (double)n);
【问题讨论】:
-
为什么这是在 For 循环内部而不是外部?固定 (char* resultRef = result) 固定 (byte* sourceRef = source) - 见stackoverflow.com/questions/8497018/…
-
@tolanj:我无法回答 OP,但我怀疑这是因为将
fixed放在匿名方法中比放在外面容易得多,因为有关捕获指针的规则。为了咧嘴笑,我继续在外面用fixed测试它,发现它并不重要。请注意,在这种情况下,fixed语句每个线程只执行一次;耗时的循环是在fixed语句中。 -
@PeterDuniho:同意评估顺序。在 debug-vs-release 上,您的体验与我的很多不同。当您将运行与附加和未附加的调试器进行比较时尤其如此。我见过截然不同的相对时间。也就是说,算法 A 在调试时比 B 快得多,在发布时比 B 慢得多。
-
我很惊讶没有人问...您使用的是什么版本的 .net? 4.0 的 TPL 实现很糟糕,在我见过的大多数基准测试中,它的开销约为 100%。
-
@PanagiotisKanavos:虽然内存同步确实可能是问题,但在这种情况下不太可能。
result变量本质上是一个大小为 8 兆字节的数组。因此,对于四个线程,每个线程都在自己的 2 兆字节块上工作。缓存行大小通常为 64 字节,因此 CPU 之间不太可能发生任何类型的内存争用。他们根本不会同时修改相同的缓存行。在最坏的情况下,唯一的争用是在边缘(2 MB 边界周围的 64 个字节)。
标签: c# concurrency parallel-processing task-parallel-library unsafe