存储分组关系和支持外观的最佳数据结构答案

【问题标题】：Best data structure to store for a grouping relation and support look存储分组关系和支持外观的最佳数据结构
【发布时间】：2019-11-02 18:04:00
【问题描述】：

我需要创建一个数据结构来跟踪一些分组信息。假设元素只是字符串。例如，{'a', 'b', 'c'} 是一个组，而 {'e', 'f', 'g'} 是另一个组。我还需要支持按键查找，并且按键都是字符串。现在，我可以考虑使用地图：

{a} -> {"a", "b", "c"}
{b} -> {"a", "b", "c"}

{e} -> {"e", "f", "g"}
{f} -> {"e", "f", "g"}

但在这种情况下，我在地图中复制了大量信息，并且尺寸会爆炸。还有其他什么好的数据结构，既紧凑又支持快速查找？

【问题讨论】：

"但是在这种情况下，我在地图中复制了很多信息" - 怎么回事？这里重复了什么？您是否担心 3 个元素的数组/向量会引入性能问题？您可能总是有一个 std::shared_ptrs 的映射作为所述数组/向量的值。
一个问题。键 a 是否称为 a，因为它的元素列表包含“a”？ “b”和“c”的情况是否相同？
为什么这个标签是 [c++] ？如果你想要一个 C++ 答案，请展示你到目前为止所做的尝试。您应该阅读 How to Ask 和 minimal reproducible example
什么是“好的数据结构”是由需求决定的，你的需求是不完整的。如果您只想快速查找组，您可以将"a"、"b" 和"c" 映射到1；和"e"、"f" 和"g" 到2。（由于您对组没有要求，因此需要为它们存储的只是允许比较相等性的值。整数很方便。）

标签： c++ data-structures hashmap

【解决方案1】：

但在这种情况下，我在地图中复制了很多信息，并且尺寸会爆炸。任何其他紧凑的良好数据结构并且还支持快速查找？

除了将元素直接映射到组之外，您还可以通过将元素（std::strings）映射到组 ID 来引入额外的间接级别来消除这种重复，这是索引。然后，您可以保留std::vector 的组。您使用映射检索到的组 ID 来索引此组向量。

作为一个实现这个想法的例子：

#include <unordered_map>
#include <unordered_set>
#include <string>
#include <vector>

class GroupRelation {
   std::unordered_map<std::string, group_id_t> elem2group_id_;
   std::vector<std::unordered_set<std::string>> groups_;
public:
   using group_id_t = size_t;

   auto num_groups() const { groups_.size(); }

   auto add_group(std::unordered_set<std::string> group) {
      auto grp_id = groups_.size();
      for (auto const& elem: group)
         elem2group_id_[elem] = grp_id;

      groups_.push_back(std::move(group));
      return grp_id; // return group_id_t of just added group
   }

   // for checking whether or not an element is in a group
   bool is_in_group(const std::string& elem) const {
      auto it = elem2group_id_.find(elem); 
      return elem2group_id_.end() != it;
   }

   // returns the group ID where the element belongs
   group_id_t group_id(const std::string& elem) const {
      auto it = elem2group_id_.find(elem); 
      return it->second;
   }

   const std::unordered_set<std::string>& group(group_id_t group_id) const {
      return groups_[group_id];
   }

   std::unordered_set<std::string>& group(group_id_t group_id) {
      return groups_[group_id];
   }
};

平均而言，从元素中检索组 ID 可以在恒定时间内完成。

使用示例：

auto main() -> int {
   GroupRelation grp_rel;

   grp_rel.add_group({"a", "b", "c"});   
   grp_rel.add_group({"e", "f", "g"});

   for (auto const& elem: grp_rel.group(0))
      std::cout << elem << ' ';
   std::cout << '\n';

   for (auto const& elem: grp_rel.group(1))
      std::cout << elem << ' ';
   std::cout << '\n';

}

我的输出：

b c a 
g f e

【讨论】：

【解决方案2】：

您已经拥有一种快速的数据结构，您必须明智地使用它。
如果您想要 3 个不同字符串 (s1,s2,s3) 的两个 make 键，请执行此操作

在地图中添加键、值
创建一个新字符串s1+"_"+s2+"_"+s3
以此为键

从地图中检索值时
创建一个新字符串s1+"_"+s2+"_"+s3
以此为键

UnderScore 在这里完成所有工作。

这也够快了。

【讨论】：