查找值属于哪个 bin答案

【问题标题】：Finding which bin a values fall into查找值属于哪个 bin
【发布时间】：2012-02-14 12:38:21
【问题描述】：

我正在尝试查找双 x 属于哪个类别 C。我的类别在这样的文件中被定义为字符串名称和双倍值

A 1.0
B 2.5
C 7.0

应该这样解释

"A": 0 < x <= 1.0
"B": a < x <= 2.5
"C": b < x <= 7.0

（输入可以有任意长度，并且可能必须按它们的值排序）。我只需要这样的功能

std::string findCategory(categories_t categories, double x) {
    ...insert magic here
}

所以对于这个例子，我希望

findCategory(categories, 0.5) == "A"
findCategory(categories, 1.9) == "B"
findCategory(categories, 6.0) == "C"

所以我的问题是 a) 如何编写函数和 b) category_t 的最佳选择可能是什么（在 11 C++ 之前使用 stl）。我做了几次尝试，但都……不太成功。

【问题讨论】：

标签： c++ algorithm data-structures stl range

【解决方案1】：

一种选择是使用std::map 容器，将双精度作为键和值，对应于分配给上端点为给定值的范围的值。例如，给定您的文件，您将拥有这样的地图：

std::map<double, std::string> lookup;
lookup[1.0] = "A";
lookup[2.5] = "B";
lookup[7.0] = "C";

然后，您可以使用std::map::lower_bound 函数，给定某个点，来取回键/值对，其键（上端点）是映射中的第一个键，至少与所讨论的点一样大.例如，对于上面的映射，lookup.lower_bound(1.37) 将返回一个值为“B”的迭代器。 lookup.lower_bound(2.56) 将返回一个值为“C”的迭代器。这些查找速度很快；对于具有 n 个元素的地图，他们需要 O(log n) 时间。

在上面，我假设您要查找的值都是非负的。如果允许负值，您可以在进行任何查找之前添加一个快速测试以检查该值是否为负。这样，您就可以消除虚假结果。

对于它的价值，如果您碰巧知道查找的分布（例如，它们是均匀分布的），则可以构建一个称为 optimal binary search tree 的特殊数据结构，将提供比std::map 更好的访问时间。此外，根据您的应用程序，可能会有更快的选项可用。例如，如果您这样做是因为您想随机选择具有不同概率的结果之一，那么我建议您查看 this article on the alias method，它可以让您在 O(1 ) 时间。

希望这会有所帮助！

【讨论】：

我正要说哈希表非常适合这个，然后我想起std::map 不是哈希表。但是，根据问题中列出的类别，我认为upper_bound 会更好。
等等，不，lower_bound 是对的，但我认为在你的例子中你有一个不合时宜的情况。您对lower_bound 的描述有误。
其实lower_bound返回第一个key不小于测试点的迭代器。因此，它将返回一个迭代器，该迭代器指向"C" 的位置（对于1.37 和"Undefined" 对于2.5）。因此，您需要将"A" 放在1.0 的位置，"B" 在2.5 等位置
@Grizzly- 啊，是的！谢谢！固定。
@templatetypedef：不要迂腐，但你的描述仍然不完全正确。找到的密钥不是尽可能小而仍然更大，而是尽可能小而不会更小，因此找到的密钥比较大于或等于搜索的密钥，这是与upper_bound的区别，它将返回大于（不等于）测试点的第一个键的位置

【解决方案2】：

你可以使用pair类型和中的'lower_bound' http://www.cplusplus.com/reference/algorithm/lower_bound/.

让我们根据上边缘来定义您的类别： typedef 对 categories_t;

然后只需制作这些边的向量并使用二进制搜索进行搜索。请参阅下面的完整示例。

#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

using namespace std;
typedef pair<double,string> category_t;

std::string findCategory(const vector<category_t> &categories, double x) {
   vector<category_t>::const_iterator it=std::lower_bound(categories.begin(), categories.end(),category_t(x,""));
   if(it==categories.end()){
      return "";
   }
   return it->second;
}

int main (){

   vector< category_t > edges;
   edges.push_back(category_t(0,"bin n with upper edge at 0 (underflow)"));
   edges.push_back(category_t(1,"bin A with upper edge at 1"));
   edges.push_back(category_t(2.5,"bin B with upper edge at 2.5"));
   edges.push_back(category_t(7,"bin C with upper edge at 7"));
   edges.push_back(category_t(8,"bin D with upper edge at 8"));
   edges.push_back(category_t(9,"bin E with upper edge at 9"));
   edges.push_back(category_t(10,"bin F with upper edge at 10"));

   vector< double > examples ;
   examples.push_back(1);
   examples.push_back(3.3);
   examples.push_back(7.4);
   examples.push_back(-5);
   examples.push_back(15);

   for( vector< double >::const_iterator eit =examples.begin();eit!=examples.end();++eit)
      cout << "value "<< *eit << " : " << findCategory(edges,*eit) << endl;   
}

比较按照我们想要的方式进行，因为双精度是配对中的第一个，并且配对首先通过比较第一个成分然后是第二个成分进行比较。否则，我们将定义一个比较谓词，如我上面链接的页面所述。

【讨论】：

这正是我认为需要的。