创建 std::vector 副本减去一个元素的最快方法答案

【问题标题】：fastest way to create a copy of a std::vector minus one element创建 std::vector 副本减去一个元素的最快方法
【发布时间】：2016-12-03 18:43:06
【问题描述】：

我有一个 std::vector [1, 2, 3, 4, 5]，我想获得另一个包含除第二个元素之外的所有元素的向量：[1, 3, 4, 5]。一种方法是（vec1 是我的输入向量）：

std::vector<int> vec2;
vec2 = vec1;
vec2.erase(vec2.begin()+1)

这里我不太喜欢 O(n) 复杂度的擦除，所以考虑到数组的副本，我将有 2n 次操作。我在想旧的虚拟方式会更好：

std::vector<int> vec2;
for(int i=0; i<vec1.size(); ++i){
    if (i != 1) 
        vec2.push_back(vec1[i]);
}

这是摊销的 O(n) 时间。渐近行为是相同的，但操作的数量可能会更少。

我必须在相当小的向量（大约 100 个元素）上执行此操作，但我有数十亿个。我会注意到显着差异吗？

你会怎么做？

【问题讨论】：

先打电话预约。至于您是否会注意到差异，请仔细分析它，这是获得可靠答案的唯一方法。
也许你应该重新考虑你的算法，如果没有必要就不要复制。另请查看 insert()
您可以向后迭代向量并擦除最后一个元素，因此您不必复制。或者您可以跟踪应该是第一个元素的索引并从它迭代到结束。有很多方法可以避免复制。

标签： c++ algorithm vector data-structures time-complexity

【解决方案1】：

关于复杂性，你不可能比 O(n) 更好，因为无论如何你都必须复制 n 个元素。但出于一些微优化的目的，您可以：

先验地保留大小，就像在 cmets 中一样

避免检查循环内的条件，通过连续复制 2 次但不检查：

std::vector<int> vec1 = { 1,2,3,4,5 };
std::vector<int> vec2(vec1.size()-1);

constexpr int i = 1; // let i be the position you want to skip

std::copy(vec1.cbegin(), vec1.cbegin() + i, vec2.begin());
std::copy(vec1.cbegin() + i+1, vec1.cend(), vec2.begin()+i);

由于没有if 或std::copy-if 语句，因此副本将很简单，而且编译器优化空间很大。

编辑：在下面的 cmets 中，调整向量的大小会导致一些额外的初始化，请参阅此处讨论避免初始化的方法：Using vector<char> as a buffer without initializing it on resize()

【讨论】：

"reservation at creation" 不保留；它初始化这些值。这是一个 O(n) 过程。如果你想保留，你应该调用reserve。
@NicolBolas 我说这个过程无论如何都会是 O(n)。但是，您将调用初始化程序是对的。众所周知，使用 std::vector 进行保留或在创建时调整大小是困难的（仍然可行）（请参阅stackoverflow.com/questions/15219984/…）。

【解决方案2】：

这可以使用std::vector 的现有例程来完成。给定i，即要跳过的位置（在vec的范围内）：

template<typename T>
vector<T> skip_copy(const vector<T> &vec, size_t i)
{
  vector<T> ret;
  ret.reserve(vec.size() - 1);
  ret.insert(ret.end(), vec.begin(), vec.begin() + i);
  ret.insert(ret.end(), vec.begin() + i + 1, vec.end());
  return ret;
}

通过保留空间，我们避免了ret 中任何不必要的内存分配。而vector::insert，在最后完成后，将不需要移动元素。

当然，insert 可能会执行一些超出严格要求的额外条件（检查迭代器的范围是否适合现有存储），但一系列push_back 调用不太可能会更快。如果vector 的大小很大，则不必预先初始化数组是值得的。

【讨论】：

【解决方案3】：

这是我能想到的最快的方法：

std::vector<int> vec1{1, 2, 3, 4, 5};

// Do not call the constructor with the known size here. That would allocate
// AND initialize a lot of data that will be overwritten later anyway.
std::vector<int> vec2;

// Instead, use it on 'reserve' since it only allocates the memory and
// leaves it uninitialized. The vector is still considered empty, but its
// underlying capacity will be big enough to be filled with new data
// without any unnecessary initializations and reallocations occuring (as long
// as you don't exceed the capacity)
vec2.reserve(vec1.size() - 1);

// Nothing expensive should be going on behind the scenes now, just fill it up
for(auto it = vec1.begin(), skip = vec1.begin() + 1 ; it != vec1.end() ; ++it) {
    if(it != skip) {
        vec2.push_back(*it);
    }
}

【讨论】：