指定哈希函数时在 unordered_map<> 中使用默认存储桶计数答案

【问题标题】：Using default bucket count in unordered_map<> when specifying hash function指定哈希函数时在 unordered_map<> 中使用默认存储桶计数
【发布时间】：2024-01-16 18:26:02
【问题描述】：

我正在使用 unordered_map 并且很好奇，当将哈希函数指定为第二个参数时（根据下面的代码），必须将 size_type n 存储桶计数指定为构造函数中的第一个参数。我已阅读应使用默认存储桶计数。有谁知道在使用自己的哈希函数时如何使用默认的桶计数参数？

有趣的是，Stroustrup C++ 第 4 版第 918 页构造了一个 unordered_set 而不使用存储桶大小，并且不同意记录的构造函数参数。

explicit unordered_map ( size_type n = /* see below */,
                         const hasher& hf = hasher(),
                         const key_equal& eql = key_equal(),
                         const allocator_type& alloc = allocator_type() );

示例用法：

#include <unordered_map>
#include <functional>
#include <iostream>
using namespace std;

struct X {
    X(string n) : name{n} {}
    string name;
    bool operator==(const X& b0) const { return name == b0.name; }
};

namespace std {
    template<>
    struct hash<X> {
        size_t operator()(const X&) const;
    };
    size_t hash<X>::operator()(const X& a) const
    {
        cout << a.name << endl;
        return hash<string>{}(a.name);
    }
}

size_t hashX(const X& a)
{
    return hash<string>{}(a.name);
}

int main()
{
//    unordered_map<X,int,hash<X>> m(100, hash<X>{});
//    unordered_map<X,int,function<size_t(const X&)>> m(100, &hashX);
    unordered_map<X,int,size_t(*)(const X&)> m(100, &hashX);
    X x{"abc"};
    m[x] = 1;
    int i = m[x];
    cout << i << endl;
}

【问题讨论】：

我认为没有任何通用的指导来设置多个存储桶。理想情况下，桶的数量将与元素的数量相同，并且每个元素最终将位于不同的桶中，但这在实践中几乎永远无法实现。但是如果你事先知道元素的数量，那么设置更高的桶数可能是避免桶数组重新分配的一个好的开始（前提是你有一个高质量的哈希函数）。
我在别处读到，希望有 70% 的负载因子，所以 (n / .7 + 1)。我很好奇人们也声明使用默认值，但是如果使用用户定义的容器，可能需要指定哈希函数，这需要指定桶数。
@andre 同意。问题是如果我定义用户哈希函数类型并将其作为第二个参数传递，我将无法使用默认存储桶大小。必须指定存储桶大小。这是我的困惑。
根据评论编辑了问题，更准确地说。如果它改变了问题的意图，请随意回滚。
我不确定我是否理解你的问题。仅使用默认存储桶计数的 unordered_map<X,int,std::hash<X>> m; 有什么问题？

标签： c++ templates unordered-map

【解决方案1】：

似乎我们可以访问bucket_count 值。我会在您的环境中运行以下代码并检查它为您提供的值。

#include <iostream>
#include <unordered_map>

int main() {
    std::unordered_map<int, int> m;
    std::cout << m.bucket_count() << std::endl;
    return 0;
}

这会在ideone中输出1

【讨论】：

我想知道如果使用默认值 1 是否会自我调整
@notaorb: 绝对 - 只要插入会使负载因子（size():bucket_count() 的比率）超过 max_load_factor()，哈希表就会增长，默认情况下为 1新建unordered_map。因此，在功能上你选择什么并不重要，尽管如果你预先调整到所需的元素数量，插入速度可能会提高一倍。