谁能比这更快排序？ [关闭]答案

【问题标题】：Can anyone sort faster than this? [closed]谁能比这更快排序？ [关闭]
【发布时间】：2022-08-14 10:47:56
【问题描述】：

我能够为整数编写更快的排序！它的排序速度比生成数组的速度要快。它通过声明一个数组的长度等于要排序并初始化为零的整数数组的最大值来工作。然后，将要排序的数组循环使用它作为计数数组的索引 - 每次遇到值时都会递增。随后，循环计数数组并将其索引按顺序分配给输入数组的计数次数。下面的代码：

SUBROUTINE icountSORT(arrA, nA)
  ! This is a count sort.  It counts the frequency of
  ! each element in the integer array to be sorted using
  ! an array with a length of MAXVAL(arrA)+1 such that
  ! 0\'s are counted at index 1, 1\'s are counted at index 2,
  ! etc.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER(KIND=8),DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER(KIND=8),ALLOCATABLE,DIMENSION(:) :: arrB
  INTEGER(KIND=8) :: i, j, k, maxA
  INTEGER ::  iStat

  maxA = MAXVAL(arrA)
  ALLOCATE(arrB(maxA+1),STAT=iStat)

  arrB = 0

  DO i = 1, nA
    arrB(arrA(i)+1) = arrB(arrA(i)+1) + 1
  END DO

  k = 1
  DO i = 1, SIZE(arrB)
    DO j = 1, arrB(i)
      arrA(k) = i - 1
      k = k + 1
    END DO
  END DO

END SUBROUTINE icountSORT

发布更多证据。 nlogn predicts too high execution times at large array sizes. 此外，在此问题末尾附近发布的 Fortran 程序将数组（未排序和排序）写入文件并发布写入和排序时间。文件写入是一个已知的 O(n) 过程。排序的运行速度比一直写入最大数组的文件要快。如果排序以 O(nlogn) 运行，那么在某些时候，排序时间将超过写入时间，并且在大数组大小时变得更长。因此，已经表明该排序例程以 O(n) 时间复杂度执行。

我在这篇文章的底部添加了一个完整的 Fortran 编译程序，以便可以重现输出。执行时间是线性的。

使用 Win 10 中 Debian 环境中的以下代码，以更清晰的格式提供更多计时数据：

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=50000000; i=2*i )); do ./derrelSORT-example.py $i; done | awk  \'BEGIN {print \"N      Time(s)\"}; {if ($1==\"Creating\") {printf $4\" \"} else if ($1==\"Sorting\" && $NF==\"seconds\") {print $3}}\'
N      Time(s)
100000 0.01
200000 0.02
400000 0.04
800000 0.08
1600000 0.17
3200000 0.35
6400000 0.76
12800000 1.59
25600000 3.02

此代码相对于元素的数量线性执行（此处给出的整数示例）。它通过随着（合并）排序的进行以指数方式增加排序块的大小来实现这一点。为了促进呈指数增长的块：

在排序开始前需要计算迭代次数
需要为块（特定于语言，取决于索引协议）派生索引转换，以便通过 merge()

当块大小不能被 2 的幂整除时，优雅地处理列表尾部的余数

考虑到这些事情并开始，传统上，通过合并单值数组对，合并的块可以从 2 到 4 到 8 到 16 到 --- 到 2^n。这种单一情况是打破比较排序的 O(nlogn) 时间复杂度的速度限制的例外。该例程相对于要排序的元素数量进行线性排序。

任何人都可以更快地排序吗？ ;)

Fortran 代码 (derrelSort.f90)：

! Derrel Walters © 2019
! These sort routines were written by Derrel Walters ~ 2019-01-23


SUBROUTINE iSORT(arrA, nA)
  ! This implementation of derrelSORT is for integers,
  ! but the same principles apply for other datatypes.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER,DIMENSION(nA) :: arrB
  INTEGER(KIND=8) :: lowIDX, highIDX, midIDX
  INTEGER ::  iStat
  INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops
  INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark
  LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo

  arrB = arrA
  mergeSize = 2
  lowIDX = 1 - mergeSize
  highIDX = 0

  nLoops = INT(LOG(REAL(nA))/LOG(2.0))
  ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat)
  moreToGo = .FALSE.
  iterMark = 0

  DO i = 1, nLoops
    iterMark(i) = FLOOR(REAL(nA)/2**i)
    IF (MOD(nA, 2**i) > 0) THEN
      moreToGo(i) = .TRUE.
      iterMark(i) = iterMark(i) + 1
    END IF
  END DO

  DO i = 1, nLoops
      DO j = 1, iterMark(i)
        A = 0
        B = 1
        C = 0
        lowIDX = lowIDX + mergeSize
        highIDX = highIDX + mergeSize
        midIDX = (lowIDX + highIDX + 1) / 2
        thisHigh = highIDX
        IF (j == iterMark(i).AND.moreToGo(i)) THEN
          lowIDX = lowIDX - mergeSize
          highIDX = highIDX - mergeSize
          midIDX = (lowIDX + highIDX + 1) / 2
          A = midIDX - lowIDX
          B = 2
          C = nA - 2*highIDX + midIDX - 1
          thisHigh = nA
        END IF
        CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),    &
                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C,   &
                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)
        arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh)
      END DO
      mergeSize = 2*mergeSize
      lowIDX = 1 - mergeSize
      highIDX = 0
  END DO

END SUBROUTINE iSORT

SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC)
  ! This merge is a faster merge.  Array A arrives
  ! just to the left of Array B, and Array C is
  ! filled from both ends simultaneously - while
  ! still preserving the stability of the sort.
  ! The derrelSORT routine is so fast, that
  ! the merge does not affect the O(n) time
  ! complexity of the sort in practice
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC

  INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
  INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
  INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC

  INTEGER(KIND=8) :: i, j, k, x, y, z

  arrC = 0
  i = 1
  j = 1
  k = 1
  x = nA
  y = nB
  z = nC

  DO
    IF (i > x .OR. j > y) EXIT
    IF (arrB(j) < arrA(i)) THEN
      arrC(k) = arrB(j)
      j = j + 1
    ELSE
      arrC(k) = arrA(i)
      i = i + 1
    END IF
    IF (arrA(x) > arrB(y)) THEN
      arrC(z) = arrA(x)
      x = x - 1
    ELSE
      arrC(z) = arrB(y)
      y = y - 1
    END IF
    k = k + 1
    z = z - 1
  END DO

  IF (i <= x) THEN
    DO
      IF (i > x) EXIT
        arrC(k) = arrA(i)
        i = i + 1
        k = k + 1
    END DO
  ELSEIF (j <= y) THEN
    DO
      IF (j > y) EXIT
        arrC(k) = arrB(j)
        j = j + 1
        k = k + 1
    END DO
  END IF
END SUBROUTINE imerge

使用 f2py3 将上述 fortran 文件 (derrelSORT.f90) 转换为可在 python 中调用的内容的时间。这是它产生的python代码和时间（derrelSORT-example.py）：

#!/bin/python3

import numpy as np
import derrelSORT as dS
import time as t
import random as rdm
import sys

try:
  array_len = int(sys.argv[1])
except IndexError:
  array_len = 100000000

# Create an array with array_len elements
print(50*\'-\')
print(\"Creating array of\", array_len, \"random integers.\")
t0 = t.time()
x = np.asfortranarray(np.array([round(100000*rdm.random(),0)
                      for i in range(array_len)]).astype(np.int32))
t1 = t.time()
print(\'Creation time:\', round(t1-t0, 2), \'seconds\')


# Sort the array using derrelSORT
print(\"Sorting the array with derrelSORT.\")
t0 = t.time()
dS.isort(x, len(x))
t1 = t.time()
print(\'Sorting time:\', round(t1-t0, 2), \'seconds\')
print(50*\'-\')

从命令行输出。请注意时间。

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 1000000
--------------------------------------------------
Creating array of 1000000 random integers.
Creation time: 0.78 seconds
Sorting the array with derrelSORT.
Sorting time: 0.1 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 10000000
--------------------------------------------------
Creating array of 10000000 random integers.
Creation time: 8.1 seconds
Sorting the array with derrelSORT.
Sorting time: 1.07 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 20000000
--------------------------------------------------
Creating array of 20000000 random integers.
Creation time: 15.73 seconds
Sorting the array with derrelSORT.
Sorting time: 2.21 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 40000000
--------------------------------------------------
Creating array of 40000000 random integers.
Creation time: 31.64 seconds
Sorting the array with derrelSORT.
Sorting time: 4.39 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 80000000
--------------------------------------------------
Creating array of 80000000 random integers.
Creation time: 64.03 seconds
Sorting the array with derrelSORT.
Sorting time: 8.92 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 160000000
--------------------------------------------------
Creating array of 160000000 random integers.
Creation time: 129.56 seconds
Sorting the array with derrelSORT.
Sorting time: 18.04 seconds
--------------------------------------------------

更多输出：

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=500000000; i=2*i )); do
> ./derrelSORT-example.py $i
> done
--------------------------------------------------
Creating array of 100000 random integers.
Creation time: 0.08 seconds
Sorting the array with derrelSORT.
Sorting time: 0.01 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 200000 random integers.
Creation time: 0.16 seconds
Sorting the array with derrelSORT.
Sorting time: 0.02 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 400000 random integers.
Creation time: 0.32 seconds
Sorting the array with derrelSORT.
Sorting time: 0.04 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 800000 random integers.
Creation time: 0.68 seconds
Sorting the array with derrelSORT.
Sorting time: 0.08 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 1600000 random integers.
Creation time: 1.25 seconds
Sorting the array with derrelSORT.
Sorting time: 0.15 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 3200000 random integers.
Creation time: 2.57 seconds
Sorting the array with derrelSORT.
Sorting time: 0.32 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 6400000 random integers.
Creation time: 5.23 seconds
Sorting the array with derrelSORT.
Sorting time: 0.66 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 12800000 random integers.
Creation time: 10.09 seconds
Sorting the array with derrelSORT.
Sorting time: 1.35 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 25600000 random integers.
Creation time: 20.25 seconds
Sorting the array with derrelSORT.
Sorting time: 2.74 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 51200000 random integers.
Creation time: 41.84 seconds
Sorting the array with derrelSORT.
Sorting time: 5.62 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 102400000 random integers.
Creation time: 93.19 seconds
Sorting the array with derrelSORT.
Sorting time: 11.49 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 204800000 random integers.
Creation time: 167.55 seconds
Sorting the array with derrelSORT.
Sorting time: 24.13 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 409600000 random integers.
Creation time: 340.84 seconds
Sorting the array with derrelSORT.
Sorting time: 47.21 seconds
--------------------------------------------------

当数组大小加倍时，时间加倍 - 如所示。因此，米歇尔先生的初步评估是不正确的。原因是因为，虽然外循环确定每个块大小（即 log2(n)）的循环数，但内循环计数器呈指数下降随着排序的进行。然而，众所周知的证据是布丁。时间清楚地证明了线性。

如果有人需要任何帮助来复制结果，请告诉我。我很乐意提供帮助。

在本文末尾找到的 Fortran 程序是我在 2019 年编写的原样副本。它旨在用于命令行。编译它：

将 fortran 代码复制到扩展名为 .f90 的文件中

使用命令编译代码，例如：

gfortran -o derrelSORT-ex.x derrelSORT.f90

授予自己运行可执行文件的权限：

chmod u+x derrelSORT-ex.x

使用或不使用整数参数从命令行执行程序：

./derrelSORT-ex.x

或者

./derrelSORT-ex.x 10000000

输出应该看起来像这样（在这里，我使用了一个 bash c 风格的循环来重复调用该命令）。请注意，随着每次迭代的数组大小加倍，执行时间也加倍。

SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do
> ./derrelSORT-2022.x $i
> done

Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           100000
Time =    0.0000 seconds
Writing Array to rand-in.txt:
Time =    0.0312 seconds
Sorting the Array
Time =    0.0156 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.0469 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           200000
Time =    0.0000 seconds
Writing Array to rand-in.txt:
Time =    0.0625 seconds
Sorting the Array
Time =    0.0312 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.0312 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           400000
Time =    0.0156 seconds
Writing Array to rand-in.txt:
Time =    0.1250 seconds
Sorting the Array
Time =    0.0625 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.0938 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           800000
Time =    0.0156 seconds
Writing Array to rand-in.txt:
Time =    0.2344 seconds
Sorting the Array
Time =    0.1406 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.2031 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:          1600000
Time =    0.0312 seconds
Writing Array to rand-in.txt:
Time =    0.4219 seconds
Sorting the Array
Time =    0.2969 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.3906 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:          3200000
Time =    0.0625 seconds
Writing Array to rand-in.txt:
Time =    0.8281 seconds
Sorting the Array
Time =    0.6562 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.7969 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:          6400000
Time =    0.0938 seconds
Writing Array to rand-in.txt:
Time =    1.5938 seconds
Sorting the Array
Time =    1.3281 seconds
Writing Array to rand-sorted-out.txt:
Time =    1.6406 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:         12800000
Time =    0.2500 seconds
Writing Array to rand-in.txt:
Time =    3.3906 seconds
Sorting the Array
Time =    2.7031 seconds
Writing Array to rand-sorted-out.txt:
Time =    3.2656 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:         25600000
Time =    0.4062 seconds
Writing Array to rand-in.txt:
Time =    6.6250 seconds
Sorting the Array
Time =    5.6094 seconds
Writing Array to rand-sorted-out.txt:
Time =    6.5312 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:         51200000
Time =    0.8281 seconds
Writing Array to rand-in.txt:
Time =   13.2656 seconds
Sorting the Array
Time =   11.5000 seconds
Writing Array to rand-sorted-out.txt:
Time =   13.1719 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:        102400000
Time =    1.6406 seconds
Writing Array to rand-in.txt:
Time =   26.3750 seconds
Sorting the Array
Time =   23.3438 seconds
Writing Array to rand-sorted-out.txt:
Time =   27.0625 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:        204800000
Time =    3.3438 seconds
Writing Array to rand-in.txt:
Time =   53.1094 seconds
Sorting the Array
Time =   47.3750 seconds
Writing Array to rand-sorted-out.txt:
Time =   52.8906 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:        409600000
Time =    6.6562 seconds
Writing Array to rand-in.txt:
Time =  105.1875 seconds
Sorting the Array
Time =   99.5938 seconds
Writing Array to rand-sorted-out.txt:
Time =  109.9062 seconds

这是 2019 年的原样程序，未经修改：

SORT-RESEARCH$ cat derrelSORT.f90
! Derrel Walters © 2019
! These sort routines were written by Derrel Walters ~ 2019-01-23

PROGRAM sort_test
  ! This program demonstrates a linear sort routine
  ! by generating a random array (here integer), writing it
  ! to a file \'rand-in.txt\', sorting it with an
  ! implementation of derrelSORT (here for integers -
  ! where the same principles apply for other applicable
  ! datatypes), and finally, printing the sorted array
  ! to a file \'rand-sorted-out.txt\'.
  !
  ! To the best understanding of the author, the expert
  ! concensus is that a comparative sort can, at best,
  ! be done with O(nlogn) time complexity. Here a sort
  ! is demonstrated which experimentally runs O(n).
  !
  ! Such time complexity is currently considered impossible
  ! for a sort. Using this sort, extremely large amounts of data can be
  ! sorted on any modern computer using a single processor core -
  ! provided the computer has enough memory to hold the array! For example,
  ! the sorting time for a given array will be on par (perhaps less than)
  ! what it takes the same computer to write the array to a file.
  !
  ! ~ Derrel Walters

  IMPLICIT NONE

  INTEGER,PARAMETER :: in_unit = 21
  INTEGER,PARAMETER :: out_unit = 23

  INTEGER,DIMENSION(:),ALLOCATABLE :: iArrA
  REAL,DIMENSION(:),ALLOCATABLE :: rArrA
  CHARACTER(LEN=15) :: cDims
  CHARACTER(LEN=80) :: ioMsgStr
  INTEGER(KIND=8) :: nDims, i
  INTEGER :: iStat
  REAL :: start, finish

  WRITE(*,*) \'\'
  WRITE(*,\'(A)\') \'Derrel Walters © 2019\'
  WRITE(*,*) \'\'
  WRITE(*,\'(A)\') \'Demonstrating derrelSORT©\'
  WRITE(*,\'(A)\') \'WARNING: This program can produce LARGE files!\'
  WRITE(*,*) \'\'

  CALL GET_COMMAND_ARGUMENT(1, cDims)
  IF (cDims == \'\') THEN
    nDims = 1000000
  ELSE
    READ(cDims,\'(1I15)\') nDims
  END IF
  ALLOCATE(iArrA(nDims),rArrA(nDims),STAT=iStat)

  WRITE(*,\'(A,1X,1I16)\') \'Generating random array of length:\', nDims
  CALL CPU_TIME(start)
  CALL RANDOM_NUMBER(rArrA)
  iArrA = INT(rArrA*1000000)
  CALL CPU_TIME(finish)
  WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
  DEALLOCATE(rArrA,STAT=iStat)

  WRITE(*,\'(A)\') \'Writing Array to rand-in.txt: \'
  OPEN(UNIT=in_unit,FILE=\'rand-in.txt\',STATUS=\'REPLACE\',ACTION=\'WRITE\',IOSTAT=iStat,IOMSG=ioMsgStr)
  IF (iStat /= 0) THEN
    WRITE(*,\'(A)\') ioMsgStr
  ELSE
    CALL CPU_TIME(start)
    DO i=1, nDims
      WRITE(in_unit,*) iArrA(i)
    END DO
    CLOSE(in_unit)
    CALL CPU_TIME(finish)
    WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
  END IF
  WRITE(*,\'(A)\') \'Sorting the Array\'

  CALL CPU_TIME(start)
  CALL iderrelSORT(iArrA, nDims) !! SIZE(iArrA))
  CALL CPU_TIME(finish)
  WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'

  WRITE(*,\'(A)\') \'Writing Array to rand-sorted-out.txt: \'
  OPEN(UNIT=out_unit,FILE=\'rand-sorted-out.txt\',STATUS=\'REPLACE\',ACTION=\'WRITE\',IOSTAT=iStat,IOMSG=ioMsgStr)
  IF (iStat /= 0) THEN
    WRITE(*,\'(A)\') ioMsgStr
  ELSE
    CALL CPU_TIME(start)
    DO i=1, nDims
      WRITE(out_unit,*) iArrA(i)
    END DO
    CLOSE(out_unit)
    CALL CPU_TIME(finish)
    WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
  END IF
  WRITE(*,*) \'\'

END PROGRAM sort_test

SUBROUTINE iderrelSORT(arrA, nA)
  ! This implementation of derrelSORT is for integers,
  ! but the same principles apply for other datatypes.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER,DIMENSION(nA) :: arrB
  INTEGER(KIND=8) :: lowIDX, highIDX, midIDX
  INTEGER ::  iStat
  INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops
  INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark
  LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo

  arrB = arrA
  mergeSize = 2
  lowIDX = 1 - mergeSize
  highIDX = 0

  nLoops = INT(LOG(REAL(nA))/LOG(2.0))
  ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat)
  moreToGo = .FALSE.
  iterMark = 0

  DO i = 1, nLoops
    iterMark(i) = FLOOR(REAL(nA)/2**i)
    IF (MOD(nA, 2**i) > 0) THEN
      moreToGo(i) = .TRUE.
      iterMark(i) = iterMark(i) + 1
    END IF
  END DO

  DO i = 1, nLoops
      DO j = 1, iterMark(i)
        A = 0
        B = 1
        C = 0
        lowIDX = lowIDX + mergeSize
        highIDX = highIDX + mergeSize
        midIDX = (lowIDX + highIDX + 1) / 2
        thisHigh = highIDX
        IF (j == iterMark(i).AND.moreToGo(i)) THEN
          lowIDX = lowIDX - mergeSize
          highIDX = highIDX - mergeSize
          midIDX = (lowIDX + highIDX + 1) / 2
          A = midIDX - lowIDX
          B = 2
          C = nA - 2*highIDX + midIDX - 1
          thisHigh = nA
        END IF
!! The traditional merge can also be used (see subroutine for comment). !!
!                                                                        !
!        CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),   &     !
!                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C, &      !
!                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)           !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        CALL imerge2(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),   &
                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C,   &
                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)
        arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh)
      END DO
      mergeSize = 2*mergeSize
      lowIDX = 1 - mergeSize
      highIDX = 0
  END DO

END SUBROUTINE iderrelSORT

SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC)
  ! This merge is a traditional merge that places
  ! the lowest element first. The form that the
  ! time complexity takes, O(n), is not affected
  ! by the merge routine - yet this routine
  ! does not run as fast as the merge used in
  ! imerge2.
  !
  ! ~Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC

  INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
  INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
  INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC

  INTEGER(KIND=8) :: i, j, k

  arrC = 0
  i = 1
  j = 1
  k = 1

  DO
    IF (i > nA .OR. j > NB) EXIT
    IF (arrB(j) < arrA(i)) THEN
      arrC(k) = arrB(j)
      j = j + 1
    ELSE
      arrC(k) = arrA(i)
      i = i + 1
    END IF
    k = k + 1
  END DO

  IF (i <= nA) THEN
    DO
      IF (i > nA) EXIT
        arrC(k) = arrA(i)
        i = i + 1
        k = k + 1
    END DO
  ELSEIF (j <= nB) THEN
    DO
      IF (j > nB) EXIT
        arrC(k) = arrB(j)
        j = j + 1
        k = k + 1
    END DO
  END IF

END SUBROUTINE imerge

SUBROUTINE imerge2(arrA, nA, arrB, nB, arrC, nC)
  ! This merge is a faster merge.  Array A arrives
  ! just to the left of Array B, and Array C is
  ! filled from both ends simultaneously - while
  ! still preserving the stability of the sort.
  ! The derrelSORT routine is so fast, that
  ! the merge does not affect the O(n) time
  ! complexity of the sort in practice
  ! (perhaps, making its execution more linear
  ! at small numbers of elements).
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC

  INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
  INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
  INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC

  INTEGER(KIND=8) :: i, j, k, x, y, z

  arrC = 0
  i = 1
  j = 1
  k = 1
  x = nA
  y = nB
  z = nC

  DO
    IF (i > x .OR. j > y) EXIT
    IF (arrB(j) < arrA(i)) THEN
      arrC(k) = arrB(j)
      j = j + 1
    ELSE
      arrC(k) = arrA(i)
      i = i + 1
    END IF
    IF (arrA(x) > arrB(y)) THEN
      arrC(z) = arrA(x)
      x = x - 1
    ELSE
      arrC(z) = arrB(y)
      y = y - 1
    END IF
    k = k + 1
    z = z - 1
  END DO

  IF (i <= x) THEN
    DO
      IF (i > x) EXIT
        arrC(k) = arrA(i)
        i = i + 1
        k = k + 1
    END DO
  ELSEIF (j <= y) THEN
    DO
      IF (j > y) EXIT
        arrC(k) = arrB(j)
        j = j + 1
        k = k + 1
    END DO
  END IF
END SUBROUTINE imerge2

MOAR 数据使用 Fortran 版本。有人喜欢直线吗？

SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do ./derrelSORT-2022.x $i; done | awk \'BEGIN {old_1=\"Derrel\"; print \"N      Time(s)\"};{if ($1 == \"Generating\") {printf $NF\" \"; old_1=$1} else if (old_1 == \"Sorting\") {print $3; old_1=$1} else {old_1=$1}}\'
N      Time(s)
100000 0.0000
200000 0.0312
400000 0.0625
800000 0.1562
1600000 0.2969
3200000 0.6250
6400000 1.3594
12800000 2.7500
25600000 5.5625
51200000 11.8906
102400000 23.3750
204800000 47.3750
409600000 96.4531

看起来是线性的，不是吗？ ;) Fortran sorting times from above plotted.

接下来是黎曼猜想？......
我看不出有任何理由认为您的双端合并会比标准合并更快。恰恰相反。尽管它们都应该执行非常接近相同数量的步骤，但单端（和仅向前）合并往往对缓存更友好。
@DJWalters 并非所有操作都在相同的时间内执行。对于n 的实际值，内存阵列上的n log n 操作很可能比SSD 上的n 写入操作快。
我采用了问题中提供的 Fortran 程序，并使用 gfortran -O3（来自 GCC 套件的 8.5.0 版）未经修改地对其进行了编译。在样本大小 100,000 上运行它； 1,000,000; 10,000,000;并且 100,000,000 表现出明显的超线性缩放，排序阶段的执行时间比率（由程序报告）与 N = 100,000 的 1.00、11.6、144、1500 相比。这对于您的线性缩放假设来说看起来很糟糕，但对于 N 来说是合理的日志 N。
另外，是的，我可以比这更快地排序。至少，我可以修改您的代码，以将其在大小为 100,000,000 的输入上的执行时间减少约 20%。节省时间主要来自消除大量不必要的写入，例如将无论如何都会被覆盖的存储的零初始化，以及在每次合并通过后将 arrB 复制回 arrA 而不是合并它回到另一个方向。使用数组切片分配而不是循环进行复制也有一些帮助，另外还有一些其他的零碎。

标签： algorithm sorting

【解决方案1】：

你的算法不是 O(n)。您计算的循环数 (nLoops) 是 log2(n)。内部循环的数量（iterMark 中的值）基本上是 n/2、n/4、n/8 等。但是段大小真的无关紧要，因为每次通过外部循环时，您都会查看每个列表中的项目。

无论你如何混淆它，你都在做 log2(n) 传递 n 个项目：O(n log n)。

您的代码是一个相当标准的合并排序，被证明是 O(n log n)。事实证明，比较排序的一般情况是 O(n log n)。当然，某些算法可以更快地对某些特定情况进行排序。相反，相同的算法具有需要 O(n^2) 的病理情况。其他比较排序（例如堆排序、归并排序）不太受项目顺序的影响。但在一般情况下，比较排序按 n log n 比较的顺序进行。有关详细说明，请参阅https://www.cs.cmu.edu/~avrim/451f11/lectures/lect0913.pdf。

但不要相信我的话。您可以通过做一些简单的计时轻松地测试自己。计算排序（例如，100K 个项目）所需的时间。如果您的算法确实是 O(n)，那么排序 200K 项目大约需要两倍的时间，排序 100 万个项目大约需要十倍的时间。但如果它是 O(n log n)，正如我所怀疑的，那么时间会更长一些。

考虑：100K 的 log(2) 是 16.61。 200K 的 log(2) 是 17.61。所以排序 100K 项目（如果算法是 O(log n)）将花费与 100K * 16.61 成正比的时间。对 200K 项目进行排序将花费与 200K * 17.71 成正比的时间。做算术：

100K * 16.61 = 1,661,000
200K * 17.61 = 3,522,000

因此，200K 个项目将花费大约 2.12 倍 (3,522,000/1,661,000) 的时间。或者，比线性算法长约 10%。

如果您仍然不确定，请抽出多达一百万件物品。如果算法是线性的，那么一百万个项目将花费 10 万个项目所用时间的 10 倍。如果是 O(n log n)，则需要 12 倍的时间。

1M * 19.93 = 19,930,000
(19,930,000 / 1,661,000) = 11.9987 (call it 12)

【讨论】：

@DJWalters 在您的数据集中，您对每个元素进行排序的时间从 1.56e-07 秒上升到 2.4314892578125e-07 秒。这增加了约 55.9%。它略低于O(n log(n)) 的理论值 72.2%，因为您花费了一些时间来做一些事情，比如复制数据，数据呈线性增长。但是您并没有通过进行线性扩展的比较排序来打破数学定律。你真的，真的，真的没有。
@DJWalters 而且，不，我在评论之前没有运行代码。正如我所说，你所拥有的是一个相当标准的迭代合并排序的模糊实现。我看过几百遍，自己写了几十遍。当简短的分析告诉我我需要知道的一切时，我不需要运行代码。
@DJWalters 你自己的数字显示每个元素在O(n log(n)) 的预期范围内减速。如果你用你的代码来计算比较，那将显示出与理论更好的匹配。而且，无论您是否理解，您声称在线性时间内运行比较排序是不可能的。
你可能不明白为什么这是不可能的，但那是你的问题，不是我的。您获得了指向cs.cmu.edu/~avrim/451f11/lectures/lect0913.pdf 的链接，其中包含不可能的证明。
@DJWalters 没有人说你的算法不快。我们说它不是线性的。它在实践中比n 写操作运行得更快确实表明您已经提出了一种有效的算法。它没有显示线性时间复杂度。当你接近无穷大时，数学的收敛需要发生，无论你设法运行多少固定步骤，它都不必发生。

【解决方案2】：

我的 f2py 技能不强，所以我为您的代码编写了一个纯 fortran 包装器（如果您想检查它，请在下面发布），我得到的时间是：

 n                     time (s)          0.1*n/1e6       0.1*n*log(n)/1e6*log(1e6)
              1000000  0.109375000      0.100000001      0.100000001
              2000000  0.203125000      0.200000003      0.210034326
              4000000  0.453125000      0.400000006      0.440137327
              8000000  0.937500000      0.800000012      0.920411944
             16000000   1.92187500       1.60000002       1.92109859
             32000000   4.01562500       3.20000005       4.00274658
             64000000   8.26562500       6.40000010       8.32659149
            128000000   17.0468750       12.8000002       17.2953815
            256000000   35.1406250       25.6000004       35.8751564

这……恐怕不适合您的O(n) 理论。

我的包装：

module m
contains
! Your code goes here
end module

program p
  use m
  implicit none

  integer(8) :: i,n
  real, allocatable :: real_array(:)
  integer, allocatable :: int_array(:)
  real :: start
  real :: stop

  real_array = [0]
  int_array = [0]

  write(*,*) "n                     time (s)          0.1*n/1e6       0.1*n*log(n)/1e6*log(1e6)"

  do i=0,30
    n = 2**i*1e6
    deallocate(real_array, int_array)
    allocate(real_array(n), int_array(n))
    call random_number(real_array)
    int_array = -huge(0)*real_array + 2.0*huge(0)

    call cpu_time(start)
    call isort(int_array, n)
    call cpu_time(stop)

    write(*,*) n, stop-start, 0.1*n/1.0e6, 0.1*n*log(1.0*n)/(1.0e6*log(1.0e6))
  enddo
end program

【讨论】：

Fortran 与 python 代码不同，它区分大小写。不幸的是，您编写的代码无法使用 Fortran 运行。要我示范吗？我也已经编写了那个程序。
我可以向你保证，Fortran 是 not case sensitive。我还可以向您保证我的代码可以运行。
我看不出你用它调用的程序隐藏在一个模块中。使用我的可调用例程发布文件，在其中可以复制输出。顺便说一句，在查看了我的 fortran 代码之后，我现在确实意识到不区分大小写。但是，这并没有改变我在这里看不到您调用的排序例程的事实。

【解决方案3】：

其他答案已经解释了为什么您没有线性比较排序。

我将尝试解释为什么执行时间会绝不证明时间复杂度。

很多时候，你可以想出一些特定的情况和一个算法，它使用各种特定于 CPU 的优化来完成它的工作（无论该工作是排序还是其他）根据一个情节比O(n) 更好：如果@987654323 的时间@ items 是y，那么根据图表2x items 的时间小于2y。这可能发生在尽可能大的x 上，因为你可以放入内存中。

尽管如此，这并不能证明时间复杂度。这可能是一个时间复杂度为O(n) 或O(n log n) 甚至可能是O(log n) 或O(n*n) 的算法。

Big-Oh 表示法隐藏了描述算法执行的操作数量的各种常量，因此这样的算法可能只是带有非常小的常量的O(n log n)（如constant < 1）或带有巨大的O(log n)持续的。

Big-Oh 也不关心现实生活中的一些方面，例如系统内存或磁盘空间，或者某些 CPU 执行一条指令而不是另一条指令的速度。也许您使用的操作在该 CPU 上执行得非常快。无论如何，如果你有一个O(n log n) 算法，对于足够大的n，你最终会看到这个图表看起来像一个n log n 图表。

一个真实的例子是Disjoint set data structure，它使用了一个叫做iterated logarithm的东西，它的复杂性是O(m log* n)。在实践中，log* n 将是所有实际值的 <= 5，因此如果您将其绘制为实际值，您可能会认为它是 O(m) 具有很大的常数，但事实并非如此。

您可以更改算法以在每个步骤中从不同文件中读取每个数字并将其写回该文件，并完全删除您的输入数组。它不会影响其时间复杂度，但肯定会影响您所看到的执行时间测量，因为存储显然比内存慢。嗯，它们对 Big-Oh 来说都是一样的。

【讨论】：

我在程序末尾发布的 Fortran 代码将数组（已排序和未排序）写入文件。写入是一个已知的 O(n) 过程。写入和排序的执行时间顺序相同，因此具有可比性。如果排序以 O(nlogn) 执行，那么随着数组长度的增长，排序的执行时间将与写入时间收敛。但是，不存在收敛。谢谢。
永远不要说永远，兄弟。量子点
@DJWalters 这不是我要写入文件的意思。我并不是说最后将数组写入文件。我的意思是，每当您需要索引i 处的数字时，请改为从i.txt 读取，而当您需要写入位置i 时，请改为将其写入i.txt。这将使一切变得更慢，但时间复杂度将保持不变。
根据定义，在这种情况下永远不会。我很好奇，什么样的论点会让你相信你所拥有的不是线性的？

【解决方案4】：

我不怀疑您的排序速度很快，而且我相信它与sort 命令行实用程序相媲美。但它是 O(N log(N)) 迭代合并排序，而不是 O(N) 排序（也不是新算法）。

观察，

您的外部循环迭代 O(log(N)) 次。
在每个迭代中，内部循环迭代 O(N / 2^ķ) 一些时间ķ.
而每次内循环迭代的主要工作就是拆分O(2^ķ) 项目分成两半并将它们合并在一起，这涉及检查和移动每个项目。（然后将它们全部移回原始数组。）这需要 O(2^ķ) 每次内循环迭代的操作。

这些因素加在一起：

O(log(N)) * O(N / 2^ķ) * O(2^ķ)

2的因数^ķ互相取消，剩下的就是 O(N log(N))。（这ķs 是 N 的函数，因此不能简单地将它们作为常数忽略。）

对数增长非常缓慢，所以如果你不仔细看，很容易被愚弄以为你看到了线性增长，而实际上它是 N log(N)。您需要查看大范围的值才能看到超线性，其中一些确实在您的数据中可见。

至于您的绘图，曲线拟合的结果存在问题：y 截距显着为负（对于数据的规模，尤其是对于具有小y 的点的集中度）。您的数据可能很适合线性模型（如果不是完全合理的话），但它们确实似乎更适合 N log(N) 模型。

【讨论】：

O(nlogn) 预测大数组大小的执行时间太长。证据已公布。帖子末尾发布的程序功能齐全。请绘制图表并发布一些实际数据。谢谢。
@DJWalters，O(n log n) 不预测任何特定的执行时间，它预测执行时间如何秤输入大小，在渐近极限.算法分析，例如这里介绍的，是进行此类确定的常用和最确凿的技术。