是什么导致 GetHashCode 的实现比 .net 的实现慢 20 倍？答案

【问题标题】：What is causing this implementation of GetHashCode to be 20 times slower than .net's implementation?是什么导致 GetHashCode 的实现比 .net 的实现慢 20 倍？
【发布时间】：2014-11-04 12:05:55
【问题描述】：

我从this 帖子和this 中得到了子字符串struct 的想法。第二篇文章有 .net 的 String.GetHashCode() 的实现。（我不确定这是哪个版本的 .net。）

这里是实现。（GetHashCode 取自上面列出的第二个来源。）

public struct Substring
{
    private string String;
    private int Offset;
    public int Length { get; private set; }
    public char this[int index] { get { return String[Offset + index]; } }

    public Substring(string str, int offset, int len) : this()
    {
        String = str;
        Offset = offset;
        Length = len;
    }

    /// <summary>
    /// See http://www.dotnetperls.com/gethashcode
    /// </summary>
    /// <returns></returns>
    public unsafe override int GetHashCode()
    {
        fixed (char* str = String + Offset)
        {
            char* chPtr = str;
            int num = 352654597;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 27)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 27)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 1566083941));
        }
    }
}

这是一个单元测试：

    [Test]
    public void GetHashCode_IsAsFastAsString()
    {
        var s = "The quick brown fox";
        var sub = new Substring(s, 1, 5);
        var t = "quick";
        var sum = 0;

        sum += sub.GetHashCode(); // make sure GetHashCode is jitted 

        var count = 100000000;
        var sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += t.GetHashCode();
        var t1 = sw.Elapsed;
        sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += sub.GetHashCode();
        var t2 = sw.Elapsed;

        Debug.WriteLine(sum.ToString()); // make sure we use the return value
        var m1 = t1.Milliseconds;
        var m2 = t2.Milliseconds;
        Assert.IsTrue(m2 <= m1); // fat chance
    }

问题是 m1 是 10 毫秒，而 m2 是 190 毫秒。（注意：这是 1000000 次迭代。）仅供参考，我在 .net 4.5 64 位发布版本上运行此版本并启用了优化。

【问题讨论】：

与问题无关，但是你写这个类是为了节省内存吗？
您正在犯传统的基准测试错误。就像在测量中包括抖动开销一样。并且不实际使用返回值，让抖动优化器彻底消除代码。
这很好。所以我在进行任何计时之前返回并添加了另一个 sub.GetHashCode() 循环。相同的结果 - 到毫秒。
@bright: o-: Substring: 0.1175266; String: 0.0133497, o+: Substring: 0.0225464; String: 0.0253571;我是否先测试string 或Substring 方法似乎没有任何显着差异。
你还没有使用sum。添加GC.KeepAlive(sum);。调试器在启动时抑制优化。在没有调试器的情况下开始。将测试持续时间延长 10 倍或更多。

标签： c# performance gethashcode

【解决方案1】：

在注释的提示下，我再次检查以确保优化的代码正在运行。事实证明，obscure Debugger setting 正在禁用优化。所以我取消选中工具 - 选项 - 调试 - 常规 - 抑制模块加载时的 JIT 优化（仅限托管）。这会导致优化的代码正确加载。
即使启用了优化，仍然存在大约 3x - 6x 的差异。但是，这可能是因为上面的代码是 .net 32 位版本，而我正在运行 64 位 .net。将 string.GetHashCode 的 64 位实现移植到 Substring 并不容易，因为它依赖于字符串标记的零结尾（实际上是 bug）。

此时我对没有获得奇偶性能感到失望，但这是我在学习优化 C# 的一些危险和陷阱方面的一次极好的利用。

【讨论】：

您对没有获得平价性能感到惊讶，但您正在将苹果与梨进行比较。作为索引的结果，循环的开始不是从双字边界开始的。这导致汇编代码非常慢。如果你想要奇偶校验，Maye 确保你对齐你是循环开始。还要确保你的字符串更大。 90% 现在是您的代码中的开销，并且您不使用 sum。要比较性能，您需要完全不同的设置，并且您会得出不同的结论：您可以编写与大多数 .NET 内部具有相同性能的代码