【发布时间】:2022-08-14 10:47:56
【问题描述】:
我能够为整数编写更快的排序!它的排序速度比生成数组的速度要快。它通过声明一个数组的长度等于要排序并初始化为零的整数数组的最大值来工作。然后,将要排序的数组循环使用它作为计数数组的索引 - 每次遇到值时都会递增。随后,循环计数数组并将其索引按顺序分配给输入数组的计数次数。下面的代码:
SUBROUTINE icountSORT(arrA, nA)
! This is a count sort. It counts the frequency of
! each element in the integer array to be sorted using
! an array with a length of MAXVAL(arrA)+1 such that
! 0\'s are counted at index 1, 1\'s are counted at index 2,
! etc.
!
! ~ Derrel Walters
IMPLICIT NONE
INTEGER(KIND=8),INTENT(IN) :: nA
INTEGER(KIND=8),DIMENSION(nA),INTENT(INOUT) :: arrA
INTEGER(KIND=8),ALLOCATABLE,DIMENSION(:) :: arrB
INTEGER(KIND=8) :: i, j, k, maxA
INTEGER :: iStat
maxA = MAXVAL(arrA)
ALLOCATE(arrB(maxA+1),STAT=iStat)
arrB = 0
DO i = 1, nA
arrB(arrA(i)+1) = arrB(arrA(i)+1) + 1
END DO
k = 1
DO i = 1, SIZE(arrB)
DO j = 1, arrB(i)
arrA(k) = i - 1
k = k + 1
END DO
END DO
END SUBROUTINE icountSORT
发布更多证据。 nlogn predicts too high execution times at large array sizes. 此外,在此问题末尾附近发布的 Fortran 程序将数组(未排序和排序)写入文件并发布写入和排序时间。文件写入是一个已知的 O(n) 过程。排序的运行速度比一直写入最大数组的文件要快。如果排序以 O(nlogn) 运行,那么在某些时候,排序时间将超过写入时间,并且在大数组大小时变得更长。因此,已经表明该排序例程以 O(n) 时间复杂度执行。
我在这篇文章的底部添加了一个完整的 Fortran 编译程序,以便可以重现输出。执行时间是线性的。
使用 Win 10 中 Debian 环境中的以下代码,以更清晰的格式提供更多计时数据:
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=50000000; i=2*i )); do ./derrelSORT-example.py $i; done | awk \'BEGIN {print \"N Time(s)\"}; {if ($1==\"Creating\") {printf $4\" \"} else if ($1==\"Sorting\" && $NF==\"seconds\") {print $3}}\'
N Time(s)
100000 0.01
200000 0.02
400000 0.04
800000 0.08
1600000 0.17
3200000 0.35
6400000 0.76
12800000 1.59
25600000 3.02
此代码相对于元素的数量线性执行(此处给出的整数示例)。它通过随着(合并)排序的进行以指数方式增加排序块的大小来实现这一点。为了促进呈指数增长的块:
- 在排序开始前需要计算迭代次数
- 需要为块(特定于语言,取决于索引协议)派生索引转换,以便通过 merge()
- 当块大小不能被 2 的幂整除时,优雅地处理列表尾部的余数
考虑到这些事情并开始,传统上,通过合并单值数组对,合并的块可以从 2 到 4 到 8 到 16 到 --- 到 2^n。这种单一情况是打破比较排序的 O(nlogn) 时间复杂度的速度限制的例外。该例程相对于要排序的元素数量进行线性排序。
任何人都可以更快地排序吗? ;)
Fortran 代码 (derrelSort.f90):
! Derrel Walters © 2019 ! These sort routines were written by Derrel Walters ~ 2019-01-23 SUBROUTINE iSORT(arrA, nA) ! This implementation of derrelSORT is for integers, ! but the same principles apply for other datatypes. ! ! ~ Derrel Walters IMPLICIT NONE INTEGER(KIND=8),INTENT(IN) :: nA INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA INTEGER,DIMENSION(nA) :: arrB INTEGER(KIND=8) :: lowIDX, highIDX, midIDX INTEGER :: iStat INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo arrB = arrA mergeSize = 2 lowIDX = 1 - mergeSize highIDX = 0 nLoops = INT(LOG(REAL(nA))/LOG(2.0)) ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat) moreToGo = .FALSE. iterMark = 0 DO i = 1, nLoops iterMark(i) = FLOOR(REAL(nA)/2**i) IF (MOD(nA, 2**i) > 0) THEN moreToGo(i) = .TRUE. iterMark(i) = iterMark(i) + 1 END IF END DO DO i = 1, nLoops DO j = 1, iterMark(i) A = 0 B = 1 C = 0 lowIDX = lowIDX + mergeSize highIDX = highIDX + mergeSize midIDX = (lowIDX + highIDX + 1) / 2 thisHigh = highIDX IF (j == iterMark(i).AND.moreToGo(i)) THEN lowIDX = lowIDX - mergeSize highIDX = highIDX - mergeSize midIDX = (lowIDX + highIDX + 1) / 2 A = midIDX - lowIDX B = 2 C = nA - 2*highIDX + midIDX - 1 thisHigh = nA END IF CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX), & arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C, & arrB(lowIDX:thisHigh), thisHigh-lowIDX+1) arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh) END DO mergeSize = 2*mergeSize lowIDX = 1 - mergeSize highIDX = 0 END DO END SUBROUTINE iSORT SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC) ! This merge is a faster merge. Array A arrives ! just to the left of Array B, and Array C is ! filled from both ends simultaneously - while ! still preserving the stability of the sort. ! The derrelSORT routine is so fast, that ! the merge does not affect the O(n) time ! complexity of the sort in practice ! ! ~ Derrel Walters IMPLICIT NONE INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC INTEGER,DIMENSION(nA),INTENT(IN) :: arrA INTEGER,DIMENSION(nB),INTENT(IN) :: arrB INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC INTEGER(KIND=8) :: i, j, k, x, y, z arrC = 0 i = 1 j = 1 k = 1 x = nA y = nB z = nC DO IF (i > x .OR. j > y) EXIT IF (arrB(j) < arrA(i)) THEN arrC(k) = arrB(j) j = j + 1 ELSE arrC(k) = arrA(i) i = i + 1 END IF IF (arrA(x) > arrB(y)) THEN arrC(z) = arrA(x) x = x - 1 ELSE arrC(z) = arrB(y) y = y - 1 END IF k = k + 1 z = z - 1 END DO IF (i <= x) THEN DO IF (i > x) EXIT arrC(k) = arrA(i) i = i + 1 k = k + 1 END DO ELSEIF (j <= y) THEN DO IF (j > y) EXIT arrC(k) = arrB(j) j = j + 1 k = k + 1 END DO END IF END SUBROUTINE imerge使用 f2py3 将上述 fortran 文件 (derrelSORT.f90) 转换为可在 python 中调用的内容的时间。这是它产生的python代码和时间(derrelSORT-example.py):
#!/bin/python3 import numpy as np import derrelSORT as dS import time as t import random as rdm import sys try: array_len = int(sys.argv[1]) except IndexError: array_len = 100000000 # Create an array with array_len elements print(50*\'-\') print(\"Creating array of\", array_len, \"random integers.\") t0 = t.time() x = np.asfortranarray(np.array([round(100000*rdm.random(),0) for i in range(array_len)]).astype(np.int32)) t1 = t.time() print(\'Creation time:\', round(t1-t0, 2), \'seconds\') # Sort the array using derrelSORT print(\"Sorting the array with derrelSORT.\") t0 = t.time() dS.isort(x, len(x)) t1 = t.time() print(\'Sorting time:\', round(t1-t0, 2), \'seconds\') print(50*\'-\')从命令行输出。请注意时间。
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 1000000 -------------------------------------------------- Creating array of 1000000 random integers. Creation time: 0.78 seconds Sorting the array with derrelSORT. Sorting time: 0.1 seconds -------------------------------------------------- dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 10000000 -------------------------------------------------- Creating array of 10000000 random integers. Creation time: 8.1 seconds Sorting the array with derrelSORT. Sorting time: 1.07 seconds -------------------------------------------------- dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 20000000 -------------------------------------------------- Creating array of 20000000 random integers. Creation time: 15.73 seconds Sorting the array with derrelSORT. Sorting time: 2.21 seconds -------------------------------------------------- dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 40000000 -------------------------------------------------- Creating array of 40000000 random integers. Creation time: 31.64 seconds Sorting the array with derrelSORT. Sorting time: 4.39 seconds -------------------------------------------------- dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 80000000 -------------------------------------------------- Creating array of 80000000 random integers. Creation time: 64.03 seconds Sorting the array with derrelSORT. Sorting time: 8.92 seconds -------------------------------------------------- dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 160000000 -------------------------------------------------- Creating array of 160000000 random integers. Creation time: 129.56 seconds Sorting the array with derrelSORT. Sorting time: 18.04 seconds --------------------------------------------------更多输出:
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=500000000; i=2*i )); do > ./derrelSORT-example.py $i > done -------------------------------------------------- Creating array of 100000 random integers. Creation time: 0.08 seconds Sorting the array with derrelSORT. Sorting time: 0.01 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 200000 random integers. Creation time: 0.16 seconds Sorting the array with derrelSORT. Sorting time: 0.02 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 400000 random integers. Creation time: 0.32 seconds Sorting the array with derrelSORT. Sorting time: 0.04 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 800000 random integers. Creation time: 0.68 seconds Sorting the array with derrelSORT. Sorting time: 0.08 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 1600000 random integers. Creation time: 1.25 seconds Sorting the array with derrelSORT. Sorting time: 0.15 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 3200000 random integers. Creation time: 2.57 seconds Sorting the array with derrelSORT. Sorting time: 0.32 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 6400000 random integers. Creation time: 5.23 seconds Sorting the array with derrelSORT. Sorting time: 0.66 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 12800000 random integers. Creation time: 10.09 seconds Sorting the array with derrelSORT. Sorting time: 1.35 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 25600000 random integers. Creation time: 20.25 seconds Sorting the array with derrelSORT. Sorting time: 2.74 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 51200000 random integers. Creation time: 41.84 seconds Sorting the array with derrelSORT. Sorting time: 5.62 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 102400000 random integers. Creation time: 93.19 seconds Sorting the array with derrelSORT. Sorting time: 11.49 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 204800000 random integers. Creation time: 167.55 seconds Sorting the array with derrelSORT. Sorting time: 24.13 seconds -------------------------------------------------- -------------------------------------------------- Creating array of 409600000 random integers. Creation time: 340.84 seconds Sorting the array with derrelSORT. Sorting time: 47.21 seconds --------------------------------------------------当数组大小加倍时,时间加倍 - 如所示。因此,米歇尔先生的初步评估是不正确的。原因是因为,虽然外循环确定每个块大小(即 log2(n))的循环数,但内循环计数器呈指数下降随着排序的进行。然而,众所周知的证据是布丁。时间清楚地证明了线性。
如果有人需要任何帮助来复制结果,请告诉我。我很乐意提供帮助。
在本文末尾找到的 Fortran 程序是我在 2019 年编写的原样副本。它旨在用于命令行。编译它:
- 将 fortran 代码复制到扩展名为 .f90 的文件中
- 使用命令编译代码,例如:
gfortran -o derrelSORT-ex.x derrelSORT.f90- 授予自己运行可执行文件的权限:
chmod u+x derrelSORT-ex.x- 使用或不使用整数参数从命令行执行程序:
./derrelSORT-ex.x或者
./derrelSORT-ex.x 10000000输出应该看起来像这样(在这里,我使用了一个 bash c 风格的循环来重复调用该命令)。请注意,随着每次迭代的数组大小加倍,执行时间也加倍。
SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do > ./derrelSORT-2022.x $i > done Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 100000 Time = 0.0000 seconds Writing Array to rand-in.txt: Time = 0.0312 seconds Sorting the Array Time = 0.0156 seconds Writing Array to rand-sorted-out.txt: Time = 0.0469 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 200000 Time = 0.0000 seconds Writing Array to rand-in.txt: Time = 0.0625 seconds Sorting the Array Time = 0.0312 seconds Writing Array to rand-sorted-out.txt: Time = 0.0312 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 400000 Time = 0.0156 seconds Writing Array to rand-in.txt: Time = 0.1250 seconds Sorting the Array Time = 0.0625 seconds Writing Array to rand-sorted-out.txt: Time = 0.0938 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 800000 Time = 0.0156 seconds Writing Array to rand-in.txt: Time = 0.2344 seconds Sorting the Array Time = 0.1406 seconds Writing Array to rand-sorted-out.txt: Time = 0.2031 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 1600000 Time = 0.0312 seconds Writing Array to rand-in.txt: Time = 0.4219 seconds Sorting the Array Time = 0.2969 seconds Writing Array to rand-sorted-out.txt: Time = 0.3906 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 3200000 Time = 0.0625 seconds Writing Array to rand-in.txt: Time = 0.8281 seconds Sorting the Array Time = 0.6562 seconds Writing Array to rand-sorted-out.txt: Time = 0.7969 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 6400000 Time = 0.0938 seconds Writing Array to rand-in.txt: Time = 1.5938 seconds Sorting the Array Time = 1.3281 seconds Writing Array to rand-sorted-out.txt: Time = 1.6406 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 12800000 Time = 0.2500 seconds Writing Array to rand-in.txt: Time = 3.3906 seconds Sorting the Array Time = 2.7031 seconds Writing Array to rand-sorted-out.txt: Time = 3.2656 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 25600000 Time = 0.4062 seconds Writing Array to rand-in.txt: Time = 6.6250 seconds Sorting the Array Time = 5.6094 seconds Writing Array to rand-sorted-out.txt: Time = 6.5312 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 51200000 Time = 0.8281 seconds Writing Array to rand-in.txt: Time = 13.2656 seconds Sorting the Array Time = 11.5000 seconds Writing Array to rand-sorted-out.txt: Time = 13.1719 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 102400000 Time = 1.6406 seconds Writing Array to rand-in.txt: Time = 26.3750 seconds Sorting the Array Time = 23.3438 seconds Writing Array to rand-sorted-out.txt: Time = 27.0625 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 204800000 Time = 3.3438 seconds Writing Array to rand-in.txt: Time = 53.1094 seconds Sorting the Array Time = 47.3750 seconds Writing Array to rand-sorted-out.txt: Time = 52.8906 seconds Derrel Walters © 2019 Demonstrating derrelSORT© WARNING: This program can produce LARGE files! Generating random array of length: 409600000 Time = 6.6562 seconds Writing Array to rand-in.txt: Time = 105.1875 seconds Sorting the Array Time = 99.5938 seconds Writing Array to rand-sorted-out.txt: Time = 109.9062 seconds这是 2019 年的原样程序,未经修改:
SORT-RESEARCH$ cat derrelSORT.f90 ! Derrel Walters © 2019 ! These sort routines were written by Derrel Walters ~ 2019-01-23 PROGRAM sort_test ! This program demonstrates a linear sort routine ! by generating a random array (here integer), writing it ! to a file \'rand-in.txt\', sorting it with an ! implementation of derrelSORT (here for integers - ! where the same principles apply for other applicable ! datatypes), and finally, printing the sorted array ! to a file \'rand-sorted-out.txt\'. ! ! To the best understanding of the author, the expert ! concensus is that a comparative sort can, at best, ! be done with O(nlogn) time complexity. Here a sort ! is demonstrated which experimentally runs O(n). ! ! Such time complexity is currently considered impossible ! for a sort. Using this sort, extremely large amounts of data can be ! sorted on any modern computer using a single processor core - ! provided the computer has enough memory to hold the array! For example, ! the sorting time for a given array will be on par (perhaps less than) ! what it takes the same computer to write the array to a file. ! ! ~ Derrel Walters IMPLICIT NONE INTEGER,PARAMETER :: in_unit = 21 INTEGER,PARAMETER :: out_unit = 23 INTEGER,DIMENSION(:),ALLOCATABLE :: iArrA REAL,DIMENSION(:),ALLOCATABLE :: rArrA CHARACTER(LEN=15) :: cDims CHARACTER(LEN=80) :: ioMsgStr INTEGER(KIND=8) :: nDims, i INTEGER :: iStat REAL :: start, finish WRITE(*,*) \'\' WRITE(*,\'(A)\') \'Derrel Walters © 2019\' WRITE(*,*) \'\' WRITE(*,\'(A)\') \'Demonstrating derrelSORT©\' WRITE(*,\'(A)\') \'WARNING: This program can produce LARGE files!\' WRITE(*,*) \'\' CALL GET_COMMAND_ARGUMENT(1, cDims) IF (cDims == \'\') THEN nDims = 1000000 ELSE READ(cDims,\'(1I15)\') nDims END IF ALLOCATE(iArrA(nDims),rArrA(nDims),STAT=iStat) WRITE(*,\'(A,1X,1I16)\') \'Generating random array of length:\', nDims CALL CPU_TIME(start) CALL RANDOM_NUMBER(rArrA) iArrA = INT(rArrA*1000000) CALL CPU_TIME(finish) WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\' DEALLOCATE(rArrA,STAT=iStat) WRITE(*,\'(A)\') \'Writing Array to rand-in.txt: \' OPEN(UNIT=in_unit,FILE=\'rand-in.txt\',STATUS=\'REPLACE\',ACTION=\'WRITE\',IOSTAT=iStat,IOMSG=ioMsgStr) IF (iStat /= 0) THEN WRITE(*,\'(A)\') ioMsgStr ELSE CALL CPU_TIME(start) DO i=1, nDims WRITE(in_unit,*) iArrA(i) END DO CLOSE(in_unit) CALL CPU_TIME(finish) WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\' END IF WRITE(*,\'(A)\') \'Sorting the Array\' CALL CPU_TIME(start) CALL iderrelSORT(iArrA, nDims) !! SIZE(iArrA)) CALL CPU_TIME(finish) WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\' WRITE(*,\'(A)\') \'Writing Array to rand-sorted-out.txt: \' OPEN(UNIT=out_unit,FILE=\'rand-sorted-out.txt\',STATUS=\'REPLACE\',ACTION=\'WRITE\',IOSTAT=iStat,IOMSG=ioMsgStr) IF (iStat /= 0) THEN WRITE(*,\'(A)\') ioMsgStr ELSE CALL CPU_TIME(start) DO i=1, nDims WRITE(out_unit,*) iArrA(i) END DO CLOSE(out_unit) CALL CPU_TIME(finish) WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\' END IF WRITE(*,*) \'\' END PROGRAM sort_test SUBROUTINE iderrelSORT(arrA, nA) ! This implementation of derrelSORT is for integers, ! but the same principles apply for other datatypes. ! ! ~ Derrel Walters IMPLICIT NONE INTEGER(KIND=8),INTENT(IN) :: nA INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA INTEGER,DIMENSION(nA) :: arrB INTEGER(KIND=8) :: lowIDX, highIDX, midIDX INTEGER :: iStat INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo arrB = arrA mergeSize = 2 lowIDX = 1 - mergeSize highIDX = 0 nLoops = INT(LOG(REAL(nA))/LOG(2.0)) ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat) moreToGo = .FALSE. iterMark = 0 DO i = 1, nLoops iterMark(i) = FLOOR(REAL(nA)/2**i) IF (MOD(nA, 2**i) > 0) THEN moreToGo(i) = .TRUE. iterMark(i) = iterMark(i) + 1 END IF END DO DO i = 1, nLoops DO j = 1, iterMark(i) A = 0 B = 1 C = 0 lowIDX = lowIDX + mergeSize highIDX = highIDX + mergeSize midIDX = (lowIDX + highIDX + 1) / 2 thisHigh = highIDX IF (j == iterMark(i).AND.moreToGo(i)) THEN lowIDX = lowIDX - mergeSize highIDX = highIDX - mergeSize midIDX = (lowIDX + highIDX + 1) / 2 A = midIDX - lowIDX B = 2 C = nA - 2*highIDX + midIDX - 1 thisHigh = nA END IF !! The traditional merge can also be used (see subroutine for comment). !! ! ! ! CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX), & ! ! arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C, & ! ! arrB(lowIDX:thisHigh), thisHigh-lowIDX+1) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CALL imerge2(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX), & arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C, & arrB(lowIDX:thisHigh), thisHigh-lowIDX+1) arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh) END DO mergeSize = 2*mergeSize lowIDX = 1 - mergeSize highIDX = 0 END DO END SUBROUTINE iderrelSORT SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC) ! This merge is a traditional merge that places ! the lowest element first. The form that the ! time complexity takes, O(n), is not affected ! by the merge routine - yet this routine ! does not run as fast as the merge used in ! imerge2. ! ! ~Derrel Walters IMPLICIT NONE INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC INTEGER,DIMENSION(nA),INTENT(IN) :: arrA INTEGER,DIMENSION(nB),INTENT(IN) :: arrB INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC INTEGER(KIND=8) :: i, j, k arrC = 0 i = 1 j = 1 k = 1 DO IF (i > nA .OR. j > NB) EXIT IF (arrB(j) < arrA(i)) THEN arrC(k) = arrB(j) j = j + 1 ELSE arrC(k) = arrA(i) i = i + 1 END IF k = k + 1 END DO IF (i <= nA) THEN DO IF (i > nA) EXIT arrC(k) = arrA(i) i = i + 1 k = k + 1 END DO ELSEIF (j <= nB) THEN DO IF (j > nB) EXIT arrC(k) = arrB(j) j = j + 1 k = k + 1 END DO END IF END SUBROUTINE imerge SUBROUTINE imerge2(arrA, nA, arrB, nB, arrC, nC) ! This merge is a faster merge. Array A arrives ! just to the left of Array B, and Array C is ! filled from both ends simultaneously - while ! still preserving the stability of the sort. ! The derrelSORT routine is so fast, that ! the merge does not affect the O(n) time ! complexity of the sort in practice ! (perhaps, making its execution more linear ! at small numbers of elements). ! ! ~ Derrel Walters IMPLICIT NONE INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC INTEGER,DIMENSION(nA),INTENT(IN) :: arrA INTEGER,DIMENSION(nB),INTENT(IN) :: arrB INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC INTEGER(KIND=8) :: i, j, k, x, y, z arrC = 0 i = 1 j = 1 k = 1 x = nA y = nB z = nC DO IF (i > x .OR. j > y) EXIT IF (arrB(j) < arrA(i)) THEN arrC(k) = arrB(j) j = j + 1 ELSE arrC(k) = arrA(i) i = i + 1 END IF IF (arrA(x) > arrB(y)) THEN arrC(z) = arrA(x) x = x - 1 ELSE arrC(z) = arrB(y) y = y - 1 END IF k = k + 1 z = z - 1 END DO IF (i <= x) THEN DO IF (i > x) EXIT arrC(k) = arrA(i) i = i + 1 k = k + 1 END DO ELSEIF (j <= y) THEN DO IF (j > y) EXIT arrC(k) = arrB(j) j = j + 1 k = k + 1 END DO END IF END SUBROUTINE imerge2MOAR 数据使用 Fortran 版本。有人喜欢直线吗?
SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do ./derrelSORT-2022.x $i; done | awk \'BEGIN {old_1=\"Derrel\"; print \"N Time(s)\"};{if ($1 == \"Generating\") {printf $NF\" \"; old_1=$1} else if (old_1 == \"Sorting\") {print $3; old_1=$1} else {old_1=$1}}\' N Time(s) 100000 0.0000 200000 0.0312 400000 0.0625 800000 0.1562 1600000 0.2969 3200000 0.6250 6400000 1.3594 12800000 2.7500 25600000 5.5625 51200000 11.8906 102400000 23.3750 204800000 47.3750 409600000 96.4531看起来是线性的,不是吗? ;) Fortran sorting times from above plotted.
-
接下来是黎曼猜想?......
-
我看不出有任何理由认为您的双端合并会比标准合并更快。恰恰相反。尽管它们都应该执行非常接近相同数量的步骤,但单端(和仅向前)合并往往对缓存更友好。
-
@DJWalters 并非所有操作都在相同的时间内执行。对于
n的实际值,内存阵列上的n log n操作很可能比SSD 上的n写入操作快。 -
我采用了问题中提供的 Fortran 程序,并使用
gfortran -O3(来自 GCC 套件的 8.5.0 版)未经修改地对其进行了编译。在样本大小 100,000 上运行它; 1,000,000; 10,000,000;并且 100,000,000 表现出明显的超线性缩放,排序阶段的执行时间比率(由程序报告)与 N = 100,000 的 1.00、11.6、144、1500 相比。这对于您的线性缩放假设来说看起来很糟糕,但对于 N 来说是合理的日志 N。 -
另外,是的,我可以比这更快地排序。至少,我可以修改您的代码,以将其在大小为 100,000,000 的输入上的执行时间减少约 20%。节省时间主要来自消除大量不必要的写入,例如将无论如何都会被覆盖的存储的零初始化,以及在每次合并通过后将 arrB 复制回 arrA 而不是合并它回到另一个方向。使用数组切片分配而不是循环进行复制也有一些帮助,另外还有一些其他的零碎。