为什么这个快速排序似乎比 std::sort 快？答案

【问题标题】：Why does this quicksort appear to be faster than std::sort?为什么这个快速排序似乎比 std::sort 快？
【发布时间】：2021-05-05 21:02:57
【问题描述】：

为什么这个快速排序算法看起来比 std::sort 更快？我已经检查以确保它实际上是在对数组进行排序。我还用具有相同迭代次数的空心 for 循环替换了两个排序调用，以测试计时基准并检查所有内容。

我还想知道我可以对快速排序进行哪些调整以使其递归更多次。也许某种可变内存管理？

#include <iostream>     
#include <vector>       
#include <algorithm>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <chrono>
using namespace std;
void quickSort(int*, int);
void fillRandom(int*, int,int b2);
int main() {
    //setup arrays
    int size = 100000;
    auto myints = new int[size];
    auto myints2 = new int[size];
    fillRandom(myints, size,10000);
    std::copy(myints, myints + size, myints2);

    //measurement 1
    auto t1 = std::chrono::high_resolution_clock::now();
    quickSort(myints, size);
    auto t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
    std::cout << endl << "Execution 1 took: "<< duration << endl;

    //measurement 2
    t1 = std::chrono::high_resolution_clock::now();
    std::sort(myints2,myints2+size);
    t2 = std::chrono::high_resolution_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
    std::cout << endl << "Execution 2 took: " << duration << endl;


    cout << "finished!";
    return 1;
}
void fillRandom(int* p, int size,int upTo) {
    srand(time(0));
    for (int i = 0;i < size;i++) {
        p[i] = rand() % upTo + 1;
    }
}
void quickSortSwap(int *p1, int*p2) {
    int temp = *p1;
    *p1 = *p2;
    *p2 = temp;

}
void quickSort(int* original, int len) {
    int split = *original;
    int greaterIndex = len - 1;
    int lesserIndex = 1;
    int* currentP;
    //rearrange stuff so smaller is left, bigger is right
    for (int i = 1;i < len;i++) {
        currentP = original + lesserIndex;
        //cout << *currentP << " compared to " << split << endl;
        if (*currentP <= split) {
            lesserIndex++;
        }
        else {
            //cout << "greater: " << *currentP <<endl;
            quickSortSwap(currentP, original + greaterIndex);
            greaterIndex--;
        }
    }

    //uhh, now we switch pivot element with the right most left side element. Adjust our left side length measurement accordingly.
    lesserIndex--;
    quickSortSwap(original, original + lesserIndex);
    greaterIndex++;
    //this point
    if (lesserIndex > 1) {
        quickSort(original, lesserIndex);
    }
    int greater_range = len - greaterIndex;
    if (greater_range > 1) {
        quickSort(original + greaterIndex, greater_range);
    }
}

https://rextester.com/AOPBP48224

【问题讨论】：

您是否在启用优化的情况下进行编译？
@alter 当然不是，这样做之后谁会问关于 SO 的优化问题？
请注意，数组中的许多元素将相等。这可能会使比较有点偏颇。此外，我猜你检查你的快速排序是否提供了正确的结果；
我希望有一个常见问题解答或 SO 指南来解释发布性能问题时的适当要求。太多的问题要么被原始发布者“关闭”，而他们只需要打开优化，和/或“哎呀，我不知道”，当打开优化重新运行相同的测试时。
@PaulMcKenzie 在Meta Stack Overflow 上创建各自的主题。

标签： c++ algorithm optimization quicksort

【解决方案1】：

Visual Studio 的 std::sort 有一些开销和一些您的程序没有的优化。您的程序基于 Lomuto 分区方案，而 std::sort 是单个枢轴、3 个分区 Hoare，例如快速排序 + 小分区的插入排序。 3 个分区是元素枢轴。如果没有重复值，则 3 分区排序只是一些开销。如果存在重复值，那么随着重复值数量的增加，Lomuto 会变得更糟，而 Hoare 或 std::sort 会变得更好。尝试使用 fillRandom(myints, size,10);您应该会看到 Lomuto 方法对性能有很大影响，而 std::sort() 则提高了性能。

Visual Studio 的 std::sort 如果 >= 40 个元素使用 9 的中位数，33 到 39 个元素的中位数为 3，这降低了最坏情况的可能性，并切换到

【讨论】：

我运行了 fillRandom(myints, size,10) 的代码（没有优化），std::sort 快了将近 10 倍
@JimCastro "...没有优化..."浪费你的时间。非优化版本是为了便于调试而编写的，其中包含大量用于边界检查、迭代器有效性等的条件编译。
精美详细的回答，默契地解释了为什么不自己动手。从技术上讲，它不是 Visual Studio，而是 MSVC
@AluanHaddad - 根据对正在排序的内容的了解，自己滚动会更快。 std::sort 设置为非常通用，允许使用随机访问迭代器对任何对象容器进行排序，并可选择用户指定的比较函数。例如，如果只是对整数进行排序，编写更快的代码并不难。在我的系统上，使用我自己的 introsort 程序对 1600 万个伪随机 64 位无符号整数进行排序需要 1.25 秒，使用 std::sort 需要 1.50 秒，对于这种通用实现来说并没有太大的损失。