查找和擦除一个向量中的重复项并擦除另一个向量中的值答案

【问题标题】：finding and erasing duplicates in one vector and erasing values in another vector查找和擦除一个向量中的重复项并擦除另一个向量中的值
【发布时间】：2017-11-03 02:16:36
【问题描述】：

例如，我有两个std::vector<int> a 和std::vector<double> b 形式的向量

a= 1,2,3,3,4,5,6;
b=0.1, 0.3, 0.2, 0.5, 0.6, 0.1, -0.2;

两个向量的大小相同，实际上它们就像一对 XY 对 ((1,0.1) , (2,0.3)...etc)。幸运的是，a 总是从少到多排序

我想在第一个向量中找到重复项，然后删除它们中的第一个，在我的示例中，输出应该是：

a= 1,2,3,4,5,6;
b=0.1, 0.3, 0.5, 0.6, 0.1, -0.2;

在 MATLAB 中我会这样做：

b(find(diff(a) == 0)) = []; 
a(find(diff(a) == 0)) = [];

我知道我可以使用 for 循环和 if 语句以老式方式完成此操作，但我确信在 c++ 中使用容器和迭代器可以使用更优雅的方式来完成此操作。在互联网上搜索有很多例子可以擦除第一个向量中的重复项，但没有使用相同的索引来擦除第二个向量中的元素。

感谢任何帮助。

【问题讨论】：

为什么不使用一个向量来将两条数据存储在一个元素中，而不是使用并行向量呢？然后做你想做的就变得微不足道了。
或者最初将您的数据填充到std::map<key, value>，您将不再需要删除重复项，因为地图不支持重复键
您真的将0.3 存储在您的int 向量中吗？
我的错，向量 b 是 double 而不是 int，它们来自另一个我无法修改的函数。

标签： c++ c++11 unique stdvector

【解决方案1】：

我认为没有办法绕过使用 for 循环和 if 语句。

    iterator j = b.begin();
    iterator ahead = a.begin();
    ahead++;
    while(1) {
        if(ahead* == behind*) { // If we have a duplicate
            a.erase(ahead);     // we need to erase the entry in a
            b.erase(j);         // and the entry in b
        }
        else {                  // Otherwise, just move on
            j++;
            ahead++;
            behind++;
        }
        if(ahead == a.end())    // Once we reach the end of the vectors, end the loop
            break;
    }

这可能行得通。我不完全知道erase() 是如何工作的，但我认为逻辑应该有效。

【讨论】：

【解决方案2】：

你会发现很少（如果有的话）写得很好的例子的原因是大多数人喜欢从定义这样的东西开始：

struct coord {
    int x;
    double y;

    // Since we want X values unique, that's what we compare by:    
    bool operator==(coord const &other) const {
        return x == other.x;
    }
};

使用它，我们可以很容易地获得唯一的 X 和相应的 Y 对，而无需任何显式循环，因为标准库已经为该特定目的提供了算法：

std::vector<coord> ab;
// populate ab here ...

// ensure only unique X values, removing the corresponding Y when we remove an X
ab.erase(std::unique(ab.begin(), ab.end()), ab.end());

如果您确实需要将 a 和 b 维护为单独的数组，我可能仍然会相当做类似的事情，但使用 zip iterator 创建看起来/行为非常相似，您仍然可以使用 unique 和 erase 来完成这项工作。

【讨论】：

【解决方案3】：

一定有更简单的方法吗？

// compare the index vector by using the
// values of another vector
struct compare_by_other
{
    std::vector<int>& v;

    compare_by_other(std::vector<int>& v): v(v) {}

    bool operator()(std::size_t idx1, std::size_t idx2) const
        { return v[idx1] == v[idx2]; }
};

std::vector<int>    a = {1  , 2  , 3  , 3  , 3  , 4  , 4  , 5  };
std::vector<double> b = {0.2, 0.5, 0.1, 0.9, 2.5, 9.6, 0.3, 2.4};

// create an index to track which indexes need to be removed
std::vector<std::size_t> indexes(a.size());
std::iota(std::begin(indexes), std::end(indexes), 0);

// remove all the indexes that the corresponding vector finds duplicated
auto end = std::unique(std::begin(indexes), std::end(indexes), compare_by_other(a));

// erase all those elements whose indexes do not appear in the unique
// portions of the indexes vector

a.erase(std::remove_if(std::begin(a), std::end(a), [&](auto& n){
    return std::find(std::begin(indexes), end, std::distance(a.data(), &n)) == end;
}), std::end(a));

// same for b

b.erase(std::remove_if(std::begin(b), std::end(b), [&](auto& n){
    return std::find(std::begin(indexes), end, std::distance(b.data(), &n)) == end;
}), std::end(b));

【讨论】：

【解决方案4】：

不幸的是，我知道在 vanilla C++ 中没有一种优雅的方法可以做到这一点。

如果您愿意使用库，Eric Neibler 的Range-V3（目前正在成为标准）可以让您以一种半愉快的方式做到这一点：

#include <range/v3/all.hpp>
#include <iostream>

namespace rng = ranges::v3;

int main()
{ 
    std::vector<int> a{1, 2, 3, 3, 4, 5, 6};
    std::vector<double> b{0.1, 0.3, 0.2, 0.5, 0.6, 0.1, -0.2};

    auto view = rng::view::zip(a, b);

    auto result = rng::unique(view, [](auto&& x, auto&& y) {
         return x.first == y.first;
    });

    // This is a bit of a hack...
    const auto new_end_idx = rng::distance(rng::begin(view), result);

    a.erase(a.begin() + new_end_idx, a.end());
    b.erase(b.begin() + new_end_idx, b.end());

    std::cout << rng::view::all(a) << '\n';
    std::cout << rng::view::all(b) << '\n';
}

输出：

[1,2,3,4,5,6]
[0.1,0.3,0.2,0.6,0.1,-0.2]

Wandbox link

它仍然不是很理想（因为据我所知，不可能从 view::zip 迭代器中取出原始迭代器），但还不错。

【讨论】：

【解决方案5】：

没有代码的建议都解决了：

简单但效率较低的方法：

使用zip iterator 将两个向量视为两个元组/对的单个范围。（它不一定是 Boost 的，但标准库没有一个 AFAICR）。您现在已将问题简化为使用自定义比较标准过滤掉重复项（假设您不介意输出不是两个不同的数组）

使用此构造函数构建一组二元组：

template< class InputIt >
set( InputIt first, InputIt last,
     const Compare& comp = Compare(),
     const Allocator& alloc = Allocator() );

在您的情况下，默认分配器很好，但您想将比较器设置为类似

 [](const std::tuple<int, double>& lhs,
    const std::tuple<int, double>& rhs) -> bool
 { 
      return std::get<0>(lhs) < std::get<0>(rhs); 
 }

或者你可以编写一个适当的函数来做同样的事情。这当然取决于您的 zip 迭代器是否公开元组或 std::pair。

就是这样！

更有效的做法是构建一个元组向量，但在压缩迭代器范围内使用std::copy_if 填充它。

【讨论】：