C++ 排序和跟踪索引答案

【问题标题】：C++ sorting and keeping track of indexesC++ 排序和跟踪索引
【发布时间】：2010-12-07 08:12:10
【问题描述】：

使用 C++，希望使用标准库，我想按升序对一系列样本进行排序，但我也想记住新样本的原始索引。

例如，我有一个样本集、向量或矩阵A : [5, 2, 1, 4, 3]。我想将这些排序为B : [1,2,3,4,5]，但我也想记住这些值的原始索引，所以我可以得到另一个集合： C : [2, 1, 4, 3, 0 ] - 对应于原始“A”中“B”中每个元素的索引。

例如，在 Matlab 中你可以这样做：

 [a,b]=sort([5, 8, 7])
 a = 5 7 8
 b = 1 3 2

任何人都可以找到这样做的好方法吗？

【问题讨论】：

标签： c++ sorting stl indexing

【解决方案1】：

使用C++ 11 lambdas：

#include <iostream>
#include <vector>
#include <numeric>      // std::iota
#include <algorithm>    // std::sort, std::stable_sort

using namespace std;

template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) {

  // initialize original index locations
  vector<size_t> idx(v.size());
  iota(idx.begin(), idx.end(), 0);

  // sort indexes based on comparing values in v
  // using std::stable_sort instead of std::sort
  // to avoid unnecessary index re-orderings
  // when v contains elements of equal values 
  stable_sort(idx.begin(), idx.end(),
       [&v](size_t i1, size_t i2) {return v[i1] < v[i2];});

  return idx;
}

现在您可以在迭代中使用返回的索引向量，例如

for (auto i: sort_indexes(v)) {
  cout << v[i] << endl;
}

您还可以选择提供原始索引向量、排序函数、比较器，或使用额外向量在 sort_indexes 函数中自动重新排序 v。

【讨论】：

喜欢这个答案。如果你的编译器不支持 lambda，你可以使用一个类：template class CompareIndicesByAnotherVectorValues { std::vector* _values; public: CompareIndicesByAnotherVectorValues(std::vector* values) : _values(values) {} public: bool operator() (const int& a, const int& b) const { return (_values)[a] > ( _values)[b]; } };
我也喜欢这个答案，不需要复制原始向量来创建对的向量。
比起手工制作的for (size_t i = 0; i != idx.size(); ++i) idx[i] = i;我更喜欢标准的std::iota( idx.begin(), idx.end(), 0 );
对 iota() 使用 #include <numeric>
iota 是整个 C++ 标准库中命名最少的算法。

【解决方案2】：

您可以对 std::pair 进行排序，而不仅仅是 ints - 第一个 int 是原始数据，第二个 int 是原始索引。然后提供一个仅对第一个 int 进行排序的比较器。示例：

Your problem instance: v = [5 7 8]
New problem instance: v_prime = [<5,0>, <8,1>, <7,2>]

使用如下比较器对新问题实例进行排序：

typedef std::pair<int,int> mypair;
bool comparator ( const mypair& l, const mypair& r)
   { return l.first < r.first; }
// forgetting the syntax here but intent is clear enough

使用该比较器对 v_prime 进行 std::sort 的结果应该是：

v_prime = [<5,0>, <7,2>, <8,1>]

您可以通过遍历向量来剥离索引，从每个 std::pair 中获取 .second。

【讨论】：

这正是我会做的。基本的排序功能不会跟踪新旧位置，因为这会增加额外的不必要的开销。
这个函数的缺点是它需要你为所有的值重新分配内存。
这显然是一种可行的方法，但它的缺点是您必须将原始容器从“数字容器”更改为“成对容器”。

【解决方案3】：

假设给定向量是

A=[2,4,3]

创建一个新向量

V=[0,1,2] // indicating positions

排序 V 并在排序时而不是比较 V 的元素，比较 A 的相应元素

 //Assume A is a given vector with N elements
 vector<int> V(N);
 std::iota(V.begin(),V.end(),0); //Initializing
 sort( V.begin(),V.end(), [&](int i,int j){return A[i]<A[j];} );

【讨论】：

喜欢你的回答。您甚至可以使用std::iota() 进行更优雅的map 初始化
是的，我们可以使用它！感谢您的建议
std::iota(V.begin(),V.end(),x++); 可以是std::iota(V.begin(),V.end(),0);。无需创建和使用x。

【解决方案4】：

vector<pair<int,int> >a;

for (i = 0 ;i < n ; i++) {
    // filling the original array
    cin >> k;
    a.push_back (make_pair (k,i)); // k = value, i = original index
}

sort (a.begin(),a.end());

for (i = 0 ; i < n ; i++){
    cout << a[i].first << " " << a[i].second << "\n";
}

现在a 包含我们的值和它们各自的排序索引。

a[i].first = valuei'th。

a[i].second = idx 在初始数组中。

【讨论】：

考虑添加对您的代码的描述，以便访问此帖子的用户能够了解如何它的工作原理。
我实际上最喜欢这个解决方案——我的向量大小为 4 左右，我在 C++11 之前被卡住了，不能使用 lambdas。谢谢 Aditya Aswal。

【解决方案5】：

我编写了索引排序的通用版本。

template <class RAIter, class Compare>
void argsort(RAIter iterBegin, RAIter iterEnd, Compare comp, 
    std::vector<size_t>& indexes) {

    std::vector< std::pair<size_t,RAIter> > pv ;
    pv.reserve(iterEnd - iterBegin) ;

    RAIter iter ;
    size_t k ;
    for (iter = iterBegin, k = 0 ; iter != iterEnd ; iter++, k++) {
        pv.push_back( std::pair<int,RAIter>(k,iter) ) ;
    }

    std::sort(pv.begin(), pv.end(), 
        [&comp](const std::pair<size_t,RAIter>& a, const std::pair<size_t,RAIter>& b) -> bool 
        { return comp(*a.second, *b.second) ; }) ;

    indexes.resize(pv.size()) ;
    std::transform(pv.begin(), pv.end(), indexes.begin(), 
        [](const std::pair<size_t,RAIter>& a) -> size_t { return a.first ; }) ;
}

用法与 std::sort 相同，除了一个索引容器来接收排序索引。测试：

int a[] = { 3, 1, 0, 4 } ;
std::vector<size_t> indexes ;
argsort(a, a + sizeof(a) / sizeof(a[0]), std::less<int>(), indexes) ;
for (size_t i : indexes) printf("%d\n", int(i)) ;

你应该得到 2 1 0 3。对于不支持 c++0x 的编译器，将 lba 表达式替换为类模板：

template <class RAIter, class Compare> 
class PairComp {
public:
  Compare comp ;
  PairComp(Compare comp_) : comp(comp_) {}
  bool operator() (const std::pair<size_t,RAIter>& a, 
    const std::pair<size_t,RAIter>& b) const { return comp(*a.second, *b.second) ; }        
} ;

并将 std::sort 重写为

std::sort(pv.begin(), pv.end(), PairComp(comp)()) ;

【讨论】：

嗨 hkyi！我们如何实例化这个模板函数？它有两个模板类型名，其中一个是迭代器，这使得这种情况非常罕见。你能帮忙吗？

【解决方案6】：

我遇到了这个问题，并发现直接对迭代器进行排序是一种对值进行排序并跟踪索引的方法；不需要定义额外的pairs of (value, index) 容器，这在值是大对象时很有帮助；迭代器提供对值和索引的访问：

/*
 * a function object that allows to compare
 * the iterators by the value they point to
 */
template < class RAIter, class Compare >
class IterSortComp
{
    public:
        IterSortComp ( Compare comp ): m_comp ( comp ) { }
        inline bool operator( ) ( const RAIter & i, const RAIter & j ) const
        {
            return m_comp ( * i, * j );
        }
    private:
        const Compare m_comp;
};

template <class INIter, class RAIter, class Compare>
void itersort ( INIter first, INIter last, std::vector < RAIter > & idx, Compare comp )
{ 
    idx.resize ( std::distance ( first, last ) );
    for ( typename std::vector < RAIter >::iterator j = idx.begin( ); first != last; ++ j, ++ first )
        * j = first;

    std::sort ( idx.begin( ), idx.end( ), IterSortComp< RAIter, Compare > ( comp ) );
}

关于用法示例：

std::vector < int > A ( n );

// populate A with some random values
std::generate ( A.begin( ), A.end( ), rand );

std::vector < std::vector < int >::const_iterator > idx;
itersort ( A.begin( ), A.end( ), idx, std::less < int > ( ) );

现在，例如，排序向量中的第 5 个最小元素将具有值 **idx[ 5 ]，其在原始向量中的索引将是 distance( A.begin( ), *idx[ 5 ] ) 或只是 *idx[ 5 ] - A.begin( )。

【讨论】：

【解决方案7】：

考虑使用@Ulrich Eckhardt 建议的std::multimap。只是代码可以更简单。

给定

std::vector<int> a = {5, 2, 1, 4, 3};  // a: 5 2 1 4 3

在插入的平均时间排序

std::multimap<int, std::size_t> mm;
for (std::size_t i = 0; i != a.size(); ++i)
    mm.insert({a[i], i});

检索值和原始索引

std::vector<int> b;
std::vector<std::size_t> c;
for (const auto & kv : mm) {
    b.push_back(kv.first);             // b: 1 2 3 4 5
    c.push_back(kv.second);            // c: 2 1 4 3 0
}

选择std::multimap 而不是std::map 的原因是允许原始向量中的值相等。另请注意，与std::map 不同，operator[] 没有为std::multimap 定义。

【讨论】：

【解决方案8】：

还有另一种方法可以解决这个问题，使用地图：

vector<double> v = {...}; // input data
map<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
    m[*it] = it - v.begin();

这将消除非唯一元素。如果这不可接受，请使用多图：

vector<double> v = {...}; // input data
multimap<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
    m.insert(make_pair(*it, it - v.begin()));

为了输出索引，迭代地图或多地图：

for (auto it = m.begin(); it != m.end(); ++it)
    cout << it->second << endl;

【讨论】：

【解决方案9】：

@Lukasz Wiklent 的完美解决方案！尽管就我而言，我需要更通用的东西，所以我对其进行了一些修改：

template <class RAIter, class Compare>
vector<size_t> argSort(RAIter first, RAIter last, Compare comp) {

  vector<size_t> idx(last-first);
  iota(idx.begin(), idx.end(), 0);

  auto idxComp = [&first,comp](size_t i1, size_t i2) {
      return comp(first[i1], first[i2]);
  };

  sort(idx.begin(), idx.end(), idxComp);

  return idx;
}

示例：查找按长度对字符串向量进行排序的索引，第一个元素除外。

vector<string> test = {"dummy", "a", "abc", "ab"};

auto comp = [](const string &a, const string& b) {
    return a.length() > b.length();
};

const auto& beginIt = test.begin() + 1;
vector<size_t> ind = argSort(beginIt, test.end(), comp);

for(auto i : ind)
    cout << beginIt[i] << endl;

打印：

abc
ab
a

【讨论】：

【解决方案10】：

在函数中创建一个std::pair 然后对对进行排序：

通用版本：

template< class RandomAccessIterator,class Compare >
auto sort2(RandomAccessIterator begin,RandomAccessIterator end,Compare cmp) ->
   std::vector<std::pair<std::uint32_t,RandomAccessIterator>>
{
    using valueType=typename std::iterator_traits<RandomAccessIterator>::value_type;
    using Pair=std::pair<std::uint32_t,RandomAccessIterator>;

    std::vector<Pair> index_pair;
    index_pair.reserve(std::distance(begin,end));

    for(uint32_t idx=0;begin!=end;++begin,++idx){
        index_pair.push_back(Pair(idx,begin));
    }

    std::sort( index_pair.begin(),index_pair.end(),[&](const Pair& lhs,const Pair& rhs){
          return cmp(*lhs.second,*rhs.second);
    });

    return index_pair;
}

ideone

【讨论】：

【解决方案11】：

嗯，我的解决方案使用了残差技术。我们可以将排序中的值放在高 2 个字节中，将元素的索引放在低 2 个字节中：

int myints[] = {32,71,12,45,26,80,53,33};

for (int i = 0; i < 8; i++)
   myints[i] = myints[i]*(1 << 16) + i;

然后像往常一样对数组myints进行排序：

std::vector<int> myvector(myints, myints+8);
sort(myvector.begin(), myvector.begin()+8, std::less<int>());

之后，您可以通过残差访问元素的索引。以下代码打印按升序排序的值的索引：

for (std::vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it)
   std::cout << ' ' << (*it)%(1 << 16);

当然，这种技术只适用于原始数组 myints 中相对较小的值（即那些可以放入 int 的高 2 个字节的值）。但它还有一个额外的好处是可以区分 myints 的相同值：它们的索引将以正确的顺序打印。

【讨论】：

【解决方案12】：

如果可能的话，你可以使用 find 函数构建位置数组，然后对数组进行排序。

或者，也许您可以使用一个映射，其中键是元素，值是它在即将到来的数组（A、B 和 C）中的位置列表

这取决于以后对这些数组的使用。

【讨论】：

【解决方案13】：

向量中的项目是唯一的吗？如果是这样，复制向量，使用STL Sort 对其中一个副本进行排序，然后您可以找到每个项目在原始向量中的索引。

如果向量应该处理重复项，我认为你最好实现自己的排序例程。

【讨论】：

【解决方案14】：

对于这类问题将原始数组数据存储到新数据中，然后将排序数组的第一个元素二进制搜索到重复数组中，并且该索引应该存储到向量或数组中。

input array=>a
duplicate array=>b
vector=>c(Stores the indices(position) of the orignal array
Syntax:
for(i=0;i<n;i++)
c.push_back(binarysearch(b,n,a[i]));`

这里的 binarysearch 是一个函数，它接受数组、数组大小、搜索项并返回搜索项的位置

【讨论】：

【解决方案15】：

我最近接触了 C++20 <ranges> 优雅的投影功能，它允许编写更短/更清晰的代码：

std::vector<std::size_t> B(std::size(A));
std::iota(begin(B), end(B), 0);
std::ranges::sort(B, {}, [&](std::size_t i){ return A[i]; });

{} 指的是通常的std::less<std::size_t>。如您所见，我们定义了一个函数来在任何比较之前调用每个元素。这个投影功能实际上非常强大，因为这个函数可以是一个 lambda，甚至可以是一个方法，或者一个成员值。例如：

struct Item {
    float price;
    float weight;
    float efficiency() const { return price / weight; }
};

int main() {
    std::vector<Item> items{{7, 9}, {3, 4}, {5, 3}, {9, 7}};
    std::ranges::sort(items, std::greater<>(), &Item::efficiency);
    // now items are sorted by their efficiency in decreasing order:
    // items = {{5, 3}, {9, 7}, {7, 9}, {3, 4}}
}

如果我们想按升价排序：

std::ranges::sort(items, {}, &Item::price);

不要定义 operator< 或使用 lambda，使用投影！

【讨论】：

【解决方案16】：

一种解决方案是使用二维向量。

#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;

int main() {
 vector<vector<double>> val_and_id;
 val_and_id.resize(5);
 for (int i = 0; i < 5; i++) {
   val_and_id[i].resize(2); // one to store value, the other for index.
 }
 // Store value in dimension 1, and index in the other:
 // say values are 5,4,7,1,3.
 val_and_id[0][0] = 5.0;
 val_and_id[1][0] = 4.0;
 val_and_id[2][0] = 7.0;
 val_and_id[3][0] = 1.0;
 val_and_id[4][0] = 3.0;

 val_and_id[0][1] = 0.0;
 val_and_id[1][1] = 1.0;
 val_and_id[2][1] = 2.0;
 val_and_id[3][1] = 3.0;
 val_and_id[4][1] = 4.0;

 sort(val_and_id.begin(), val_and_id.end());
 // display them:
 cout << "Index \t" << "Value \n";
 for (int i = 0; i < 5; i++) {
  cout << val_and_id[i][1] << "\t" << val_and_id[i][0] << "\n";
 }
 return 0;
}

这是输出：

   Index   Value
   3       1
   4       3
   1       4
   0       5
   2       7

【讨论】：