【问题标题】:Can anyone sort faster than this? [closed]谁能比这更快排序? [关闭]
【发布时间】:2022-08-14 10:47:56
【问题描述】:

我能够为整数编写更快的排序!它的排序速度比生成数组的速度要快。它通过声明一个数组的长度等于要排序并初始化为零的整数数组的最大值来工作。然后,将要排序的数组循环使用它作为计数数组的索引 - 每次遇到值时都会递增。随后,循环计数数组并将其索引按顺序分配给输入数组的计数次数。下面的代码:

SUBROUTINE icountSORT(arrA, nA)
  ! This is a count sort.  It counts the frequency of
  ! each element in the integer array to be sorted using
  ! an array with a length of MAXVAL(arrA)+1 such that
  ! 0\'s are counted at index 1, 1\'s are counted at index 2,
  ! etc.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER(KIND=8),DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER(KIND=8),ALLOCATABLE,DIMENSION(:) :: arrB
  INTEGER(KIND=8) :: i, j, k, maxA
  INTEGER ::  iStat

  maxA = MAXVAL(arrA)
  ALLOCATE(arrB(maxA+1),STAT=iStat)

  arrB = 0

  DO i = 1, nA
    arrB(arrA(i)+1) = arrB(arrA(i)+1) + 1
  END DO

  k = 1
  DO i = 1, SIZE(arrB)
    DO j = 1, arrB(i)
      arrA(k) = i - 1
      k = k + 1
    END DO
  END DO

END SUBROUTINE icountSORT

发布更多证据。 nlogn predicts too high execution times at large array sizes. 此外,在此问题末尾附近发布的 Fortran 程序将数组(未排序和排序)写入文件并发布写入和排序时间。文件写入是一个已知的 O(n) 过程。排序的运行速度比一直写入最大数组的文件要快。如果排序以 O(nlogn) 运行,那么在某些时候,排序时间将超过写入时间,并且在大数组大小时变得更长。因此,已经表明该排序例程以 O(n) 时间复杂度执行。

我在这篇文章的底部添加了一个完整的 Fortran 编译程序,以便可以重现输出。执行时间是线性的。

使用 Win 10 中 Debian 环境中的以下代码,以更清晰的格式提供更多计时数据:

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=50000000; i=2*i )); do ./derrelSORT-example.py $i; done | awk  \'BEGIN {print \"N      Time(s)\"}; {if ($1==\"Creating\") {printf $4\" \"} else if ($1==\"Sorting\" && $NF==\"seconds\") {print $3}}\'
N      Time(s)
100000 0.01
200000 0.02
400000 0.04
800000 0.08
1600000 0.17
3200000 0.35
6400000 0.76
12800000 1.59
25600000 3.02

此代码相对于元素的数量线性执行(此处给出的整数示例)。它通过随着(合并)排序的进行以指数方式增加排序块的大小来实现这一点。为了促进呈指数增长的块:

  1. 在排序开始前需要计算迭代次数
  2. 需要为块(特定于语言,取决于索引协议)派生索引转换,以便通过 merge()
  3. 当块大小不能被 2 的幂整除时,优雅地处理列表尾部的余数

    考虑到这些事情并开始,传统上,通过合并单值数组对,合并的块可以从 2 到 4 到 8 到 16 到 --- 到 2^n。这种单一情况是打破比较排序的 O(nlogn) 时间复杂度的速度限制的例外。该例程相对于要排序的元素数量进行线性排序。

    任何人都可以更快地排序吗? ;)

    Fortran 代码 (derrelSort.f90):

    ! Derrel Walters © 2019
    ! These sort routines were written by Derrel Walters ~ 2019-01-23
    
    
    SUBROUTINE iSORT(arrA, nA)
      ! This implementation of derrelSORT is for integers,
      ! but the same principles apply for other datatypes.
      !
      ! ~ Derrel Walters
      IMPLICIT NONE
    
      INTEGER(KIND=8),INTENT(IN) :: nA
      INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA
    
      INTEGER,DIMENSION(nA) :: arrB
      INTEGER(KIND=8) :: lowIDX, highIDX, midIDX
      INTEGER ::  iStat
      INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops
      INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark
      LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo
    
      arrB = arrA
      mergeSize = 2
      lowIDX = 1 - mergeSize
      highIDX = 0
    
      nLoops = INT(LOG(REAL(nA))/LOG(2.0))
      ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat)
      moreToGo = .FALSE.
      iterMark = 0
    
      DO i = 1, nLoops
        iterMark(i) = FLOOR(REAL(nA)/2**i)
        IF (MOD(nA, 2**i) > 0) THEN
          moreToGo(i) = .TRUE.
          iterMark(i) = iterMark(i) + 1
        END IF
      END DO
    
      DO i = 1, nLoops
          DO j = 1, iterMark(i)
            A = 0
            B = 1
            C = 0
            lowIDX = lowIDX + mergeSize
            highIDX = highIDX + mergeSize
            midIDX = (lowIDX + highIDX + 1) / 2
            thisHigh = highIDX
            IF (j == iterMark(i).AND.moreToGo(i)) THEN
              lowIDX = lowIDX - mergeSize
              highIDX = highIDX - mergeSize
              midIDX = (lowIDX + highIDX + 1) / 2
              A = midIDX - lowIDX
              B = 2
              C = nA - 2*highIDX + midIDX - 1
              thisHigh = nA
            END IF
            CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),    &
                        arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C,   &
                        arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)
            arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh)
          END DO
          mergeSize = 2*mergeSize
          lowIDX = 1 - mergeSize
          highIDX = 0
      END DO
    
    END SUBROUTINE iSORT
    
    SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC)
      ! This merge is a faster merge.  Array A arrives
      ! just to the left of Array B, and Array C is
      ! filled from both ends simultaneously - while
      ! still preserving the stability of the sort.
      ! The derrelSORT routine is so fast, that
      ! the merge does not affect the O(n) time
      ! complexity of the sort in practice
      !
      ! ~ Derrel Walters
      IMPLICIT NONE
    
      INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC
    
      INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
      INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
      INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC
    
      INTEGER(KIND=8) :: i, j, k, x, y, z
    
      arrC = 0
      i = 1
      j = 1
      k = 1
      x = nA
      y = nB
      z = nC
    
      DO
        IF (i > x .OR. j > y) EXIT
        IF (arrB(j) < arrA(i)) THEN
          arrC(k) = arrB(j)
          j = j + 1
        ELSE
          arrC(k) = arrA(i)
          i = i + 1
        END IF
        IF (arrA(x) > arrB(y)) THEN
          arrC(z) = arrA(x)
          x = x - 1
        ELSE
          arrC(z) = arrB(y)
          y = y - 1
        END IF
        k = k + 1
        z = z - 1
      END DO
    
      IF (i <= x) THEN
        DO
          IF (i > x) EXIT
            arrC(k) = arrA(i)
            i = i + 1
            k = k + 1
        END DO
      ELSEIF (j <= y) THEN
        DO
          IF (j > y) EXIT
            arrC(k) = arrB(j)
            j = j + 1
            k = k + 1
        END DO
      END IF
    END SUBROUTINE imerge
    

    使用 f2py3 将上述 fortran 文件 (derrelSORT.f90) 转换为可在 python 中调用的内容的时间。这是它产生的python代码和时间(derrelSORT-example.py):

    #!/bin/python3
    
    import numpy as np
    import derrelSORT as dS
    import time as t
    import random as rdm
    import sys
    
    try:
      array_len = int(sys.argv[1])
    except IndexError:
      array_len = 100000000
    
    # Create an array with array_len elements
    print(50*\'-\')
    print(\"Creating array of\", array_len, \"random integers.\")
    t0 = t.time()
    x = np.asfortranarray(np.array([round(100000*rdm.random(),0)
                          for i in range(array_len)]).astype(np.int32))
    t1 = t.time()
    print(\'Creation time:\', round(t1-t0, 2), \'seconds\')
    
    
    # Sort the array using derrelSORT
    print(\"Sorting the array with derrelSORT.\")
    t0 = t.time()
    dS.isort(x, len(x))
    t1 = t.time()
    print(\'Sorting time:\', round(t1-t0, 2), \'seconds\')
    print(50*\'-\')
    

    从命令行输出。请注意时间。

    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 1000000
    --------------------------------------------------
    Creating array of 1000000 random integers.
    Creation time: 0.78 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.1 seconds
    --------------------------------------------------
    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 10000000
    --------------------------------------------------
    Creating array of 10000000 random integers.
    Creation time: 8.1 seconds
    Sorting the array with derrelSORT.
    Sorting time: 1.07 seconds
    --------------------------------------------------
    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 20000000
    --------------------------------------------------
    Creating array of 20000000 random integers.
    Creation time: 15.73 seconds
    Sorting the array with derrelSORT.
    Sorting time: 2.21 seconds
    --------------------------------------------------
    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 40000000
    --------------------------------------------------
    Creating array of 40000000 random integers.
    Creation time: 31.64 seconds
    Sorting the array with derrelSORT.
    Sorting time: 4.39 seconds
    --------------------------------------------------
    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 80000000
    --------------------------------------------------
    Creating array of 80000000 random integers.
    Creation time: 64.03 seconds
    Sorting the array with derrelSORT.
    Sorting time: 8.92 seconds
    --------------------------------------------------
    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 160000000
    --------------------------------------------------
    Creating array of 160000000 random integers.
    Creation time: 129.56 seconds
    Sorting the array with derrelSORT.
    Sorting time: 18.04 seconds
    --------------------------------------------------
    

    更多输出:

    dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=500000000; i=2*i )); do
    > ./derrelSORT-example.py $i
    > done
    --------------------------------------------------
    Creating array of 100000 random integers.
    Creation time: 0.08 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.01 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 200000 random integers.
    Creation time: 0.16 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.02 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 400000 random integers.
    Creation time: 0.32 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.04 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 800000 random integers.
    Creation time: 0.68 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.08 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 1600000 random integers.
    Creation time: 1.25 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.15 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 3200000 random integers.
    Creation time: 2.57 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.32 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 6400000 random integers.
    Creation time: 5.23 seconds
    Sorting the array with derrelSORT.
    Sorting time: 0.66 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 12800000 random integers.
    Creation time: 10.09 seconds
    Sorting the array with derrelSORT.
    Sorting time: 1.35 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 25600000 random integers.
    Creation time: 20.25 seconds
    Sorting the array with derrelSORT.
    Sorting time: 2.74 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 51200000 random integers.
    Creation time: 41.84 seconds
    Sorting the array with derrelSORT.
    Sorting time: 5.62 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 102400000 random integers.
    Creation time: 93.19 seconds
    Sorting the array with derrelSORT.
    Sorting time: 11.49 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 204800000 random integers.
    Creation time: 167.55 seconds
    Sorting the array with derrelSORT.
    Sorting time: 24.13 seconds
    --------------------------------------------------
    --------------------------------------------------
    Creating array of 409600000 random integers.
    Creation time: 340.84 seconds
    Sorting the array with derrelSORT.
    Sorting time: 47.21 seconds
    --------------------------------------------------
    

    当数组大小加倍时,时间加倍 - 如所示。因此,米歇尔先生的初步评估是不正确的。原因是因为,虽然外循环确定每个块大小(即 log2(n))的循环数,但内循环计数器呈指数下降随着排序的进行。然而,众所周知的证据是布丁。时间清楚地证明了线性。

    如果有人需要任何帮助来复制结果,请告诉我。我很乐意提供帮助。

    在本文末尾找到的 Fortran 程序是我在 2019 年编写的原样副本。它旨在用于命令行。编译它:

    1. 将 fortran 代码复制到扩展名为 .f90 的文件中
    2. 使用命令编译代码,例如:
      gfortran -o derrelSORT-ex.x derrelSORT.f90
      
      1. 授予自己运行可执行文件的权限:
      chmod u+x derrelSORT-ex.x
      
      1. 使用或不使用整数参数从命令行执行程序:
      ./derrelSORT-ex.x
      

      或者

      ./derrelSORT-ex.x 10000000
      

      输出应该看起来像这样(在这里,我使用了一个 bash c 风格的循环来重复调用该命令)。请注意,随着每次迭代的数组大小加倍,执行时间也加倍。

      SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do
      > ./derrelSORT-2022.x $i
      > done
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:           100000
      Time =    0.0000 seconds
      Writing Array to rand-in.txt:
      Time =    0.0312 seconds
      Sorting the Array
      Time =    0.0156 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    0.0469 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:           200000
      Time =    0.0000 seconds
      Writing Array to rand-in.txt:
      Time =    0.0625 seconds
      Sorting the Array
      Time =    0.0312 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    0.0312 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:           400000
      Time =    0.0156 seconds
      Writing Array to rand-in.txt:
      Time =    0.1250 seconds
      Sorting the Array
      Time =    0.0625 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    0.0938 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:           800000
      Time =    0.0156 seconds
      Writing Array to rand-in.txt:
      Time =    0.2344 seconds
      Sorting the Array
      Time =    0.1406 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    0.2031 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:          1600000
      Time =    0.0312 seconds
      Writing Array to rand-in.txt:
      Time =    0.4219 seconds
      Sorting the Array
      Time =    0.2969 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    0.3906 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:          3200000
      Time =    0.0625 seconds
      Writing Array to rand-in.txt:
      Time =    0.8281 seconds
      Sorting the Array
      Time =    0.6562 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    0.7969 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:          6400000
      Time =    0.0938 seconds
      Writing Array to rand-in.txt:
      Time =    1.5938 seconds
      Sorting the Array
      Time =    1.3281 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    1.6406 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:         12800000
      Time =    0.2500 seconds
      Writing Array to rand-in.txt:
      Time =    3.3906 seconds
      Sorting the Array
      Time =    2.7031 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    3.2656 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:         25600000
      Time =    0.4062 seconds
      Writing Array to rand-in.txt:
      Time =    6.6250 seconds
      Sorting the Array
      Time =    5.6094 seconds
      Writing Array to rand-sorted-out.txt:
      Time =    6.5312 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:         51200000
      Time =    0.8281 seconds
      Writing Array to rand-in.txt:
      Time =   13.2656 seconds
      Sorting the Array
      Time =   11.5000 seconds
      Writing Array to rand-sorted-out.txt:
      Time =   13.1719 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:        102400000
      Time =    1.6406 seconds
      Writing Array to rand-in.txt:
      Time =   26.3750 seconds
      Sorting the Array
      Time =   23.3438 seconds
      Writing Array to rand-sorted-out.txt:
      Time =   27.0625 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:        204800000
      Time =    3.3438 seconds
      Writing Array to rand-in.txt:
      Time =   53.1094 seconds
      Sorting the Array
      Time =   47.3750 seconds
      Writing Array to rand-sorted-out.txt:
      Time =   52.8906 seconds
      
      
      Derrel Walters © 2019
      
      Demonstrating derrelSORT©
      WARNING: This program can produce LARGE files!
      
      Generating random array of length:        409600000
      Time =    6.6562 seconds
      Writing Array to rand-in.txt:
      Time =  105.1875 seconds
      Sorting the Array
      Time =   99.5938 seconds
      Writing Array to rand-sorted-out.txt:
      Time =  109.9062 seconds
      

      这是 2019 年的原样程序,未经修改:

      SORT-RESEARCH$ cat derrelSORT.f90
      ! Derrel Walters © 2019
      ! These sort routines were written by Derrel Walters ~ 2019-01-23
      
      PROGRAM sort_test
        ! This program demonstrates a linear sort routine
        ! by generating a random array (here integer), writing it
        ! to a file \'rand-in.txt\', sorting it with an
        ! implementation of derrelSORT (here for integers -
        ! where the same principles apply for other applicable
        ! datatypes), and finally, printing the sorted array
        ! to a file \'rand-sorted-out.txt\'.
        !
        ! To the best understanding of the author, the expert
        ! concensus is that a comparative sort can, at best,
        ! be done with O(nlogn) time complexity. Here a sort
        ! is demonstrated which experimentally runs O(n).
        !
        ! Such time complexity is currently considered impossible
        ! for a sort. Using this sort, extremely large amounts of data can be
        ! sorted on any modern computer using a single processor core -
        ! provided the computer has enough memory to hold the array! For example,
        ! the sorting time for a given array will be on par (perhaps less than)
        ! what it takes the same computer to write the array to a file.
        !
        ! ~ Derrel Walters
      
        IMPLICIT NONE
      
        INTEGER,PARAMETER :: in_unit = 21
        INTEGER,PARAMETER :: out_unit = 23
      
        INTEGER,DIMENSION(:),ALLOCATABLE :: iArrA
        REAL,DIMENSION(:),ALLOCATABLE :: rArrA
        CHARACTER(LEN=15) :: cDims
        CHARACTER(LEN=80) :: ioMsgStr
        INTEGER(KIND=8) :: nDims, i
        INTEGER :: iStat
        REAL :: start, finish
      
        WRITE(*,*) \'\'
        WRITE(*,\'(A)\') \'Derrel Walters © 2019\'
        WRITE(*,*) \'\'
        WRITE(*,\'(A)\') \'Demonstrating derrelSORT©\'
        WRITE(*,\'(A)\') \'WARNING: This program can produce LARGE files!\'
        WRITE(*,*) \'\'
      
        CALL GET_COMMAND_ARGUMENT(1, cDims)
        IF (cDims == \'\') THEN
          nDims = 1000000
        ELSE
          READ(cDims,\'(1I15)\') nDims
        END IF
        ALLOCATE(iArrA(nDims),rArrA(nDims),STAT=iStat)
      
        WRITE(*,\'(A,1X,1I16)\') \'Generating random array of length:\', nDims
        CALL CPU_TIME(start)
        CALL RANDOM_NUMBER(rArrA)
        iArrA = INT(rArrA*1000000)
        CALL CPU_TIME(finish)
        WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
        DEALLOCATE(rArrA,STAT=iStat)
      
        WRITE(*,\'(A)\') \'Writing Array to rand-in.txt: \'
        OPEN(UNIT=in_unit,FILE=\'rand-in.txt\',STATUS=\'REPLACE\',ACTION=\'WRITE\',IOSTAT=iStat,IOMSG=ioMsgStr)
        IF (iStat /= 0) THEN
          WRITE(*,\'(A)\') ioMsgStr
        ELSE
          CALL CPU_TIME(start)
          DO i=1, nDims
            WRITE(in_unit,*) iArrA(i)
          END DO
          CLOSE(in_unit)
          CALL CPU_TIME(finish)
          WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
        END IF
        WRITE(*,\'(A)\') \'Sorting the Array\'
      
        CALL CPU_TIME(start)
        CALL iderrelSORT(iArrA, nDims) !! SIZE(iArrA))
        CALL CPU_TIME(finish)
        WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
      
        WRITE(*,\'(A)\') \'Writing Array to rand-sorted-out.txt: \'
        OPEN(UNIT=out_unit,FILE=\'rand-sorted-out.txt\',STATUS=\'REPLACE\',ACTION=\'WRITE\',IOSTAT=iStat,IOMSG=ioMsgStr)
        IF (iStat /= 0) THEN
          WRITE(*,\'(A)\') ioMsgStr
        ELSE
          CALL CPU_TIME(start)
          DO i=1, nDims
            WRITE(out_unit,*) iArrA(i)
          END DO
          CLOSE(out_unit)
          CALL CPU_TIME(finish)
          WRITE(*,\'(A,1X,f9.4,1X,A)\') \'Time =\',finish-start,\'seconds\'
        END IF
        WRITE(*,*) \'\'
      
      END PROGRAM sort_test
      
      SUBROUTINE iderrelSORT(arrA, nA)
        ! This implementation of derrelSORT is for integers,
        ! but the same principles apply for other datatypes.
        !
        ! ~ Derrel Walters
        IMPLICIT NONE
      
        INTEGER(KIND=8),INTENT(IN) :: nA
        INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA
      
        INTEGER,DIMENSION(nA) :: arrB
        INTEGER(KIND=8) :: lowIDX, highIDX, midIDX
        INTEGER ::  iStat
        INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops
        INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark
        LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo
      
        arrB = arrA
        mergeSize = 2
        lowIDX = 1 - mergeSize
        highIDX = 0
      
        nLoops = INT(LOG(REAL(nA))/LOG(2.0))
        ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat)
        moreToGo = .FALSE.
        iterMark = 0
      
        DO i = 1, nLoops
          iterMark(i) = FLOOR(REAL(nA)/2**i)
          IF (MOD(nA, 2**i) > 0) THEN
            moreToGo(i) = .TRUE.
            iterMark(i) = iterMark(i) + 1
          END IF
        END DO
      
        DO i = 1, nLoops
            DO j = 1, iterMark(i)
              A = 0
              B = 1
              C = 0
              lowIDX = lowIDX + mergeSize
              highIDX = highIDX + mergeSize
              midIDX = (lowIDX + highIDX + 1) / 2
              thisHigh = highIDX
              IF (j == iterMark(i).AND.moreToGo(i)) THEN
                lowIDX = lowIDX - mergeSize
                highIDX = highIDX - mergeSize
                midIDX = (lowIDX + highIDX + 1) / 2
                A = midIDX - lowIDX
                B = 2
                C = nA - 2*highIDX + midIDX - 1
                thisHigh = nA
              END IF
      !! The traditional merge can also be used (see subroutine for comment). !!
      !                                                                        !
      !        CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),   &     !
      !                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C, &      !
      !                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)           !
      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
              CALL imerge2(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),   &
                          arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C,   &
                          arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)
              arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh)
            END DO
            mergeSize = 2*mergeSize
            lowIDX = 1 - mergeSize
            highIDX = 0
        END DO
      
      END SUBROUTINE iderrelSORT
      
      SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC)
        ! This merge is a traditional merge that places
        ! the lowest element first. The form that the
        ! time complexity takes, O(n), is not affected
        ! by the merge routine - yet this routine
        ! does not run as fast as the merge used in
        ! imerge2.
        !
        ! ~Derrel Walters
        IMPLICIT NONE
      
        INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC
      
        INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
        INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
        INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC
      
        INTEGER(KIND=8) :: i, j, k
      
        arrC = 0
        i = 1
        j = 1
        k = 1
      
        DO
          IF (i > nA .OR. j > NB) EXIT
          IF (arrB(j) < arrA(i)) THEN
            arrC(k) = arrB(j)
            j = j + 1
          ELSE
            arrC(k) = arrA(i)
            i = i + 1
          END IF
          k = k + 1
        END DO
      
        IF (i <= nA) THEN
          DO
            IF (i > nA) EXIT
              arrC(k) = arrA(i)
              i = i + 1
              k = k + 1
          END DO
        ELSEIF (j <= nB) THEN
          DO
            IF (j > nB) EXIT
              arrC(k) = arrB(j)
              j = j + 1
              k = k + 1
          END DO
        END IF
      
      END SUBROUTINE imerge
      
      SUBROUTINE imerge2(arrA, nA, arrB, nB, arrC, nC)
        ! This merge is a faster merge.  Array A arrives
        ! just to the left of Array B, and Array C is
        ! filled from both ends simultaneously - while
        ! still preserving the stability of the sort.
        ! The derrelSORT routine is so fast, that
        ! the merge does not affect the O(n) time
        ! complexity of the sort in practice
        ! (perhaps, making its execution more linear
        ! at small numbers of elements).
        !
        ! ~ Derrel Walters
        IMPLICIT NONE
      
        INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC
      
        INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
        INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
        INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC
      
        INTEGER(KIND=8) :: i, j, k, x, y, z
      
        arrC = 0
        i = 1
        j = 1
        k = 1
        x = nA
        y = nB
        z = nC
      
        DO
          IF (i > x .OR. j > y) EXIT
          IF (arrB(j) < arrA(i)) THEN
            arrC(k) = arrB(j)
            j = j + 1
          ELSE
            arrC(k) = arrA(i)
            i = i + 1
          END IF
          IF (arrA(x) > arrB(y)) THEN
            arrC(z) = arrA(x)
            x = x - 1
          ELSE
            arrC(z) = arrB(y)
            y = y - 1
          END IF
          k = k + 1
          z = z - 1
        END DO
      
        IF (i <= x) THEN
          DO
            IF (i > x) EXIT
              arrC(k) = arrA(i)
              i = i + 1
              k = k + 1
          END DO
        ELSEIF (j <= y) THEN
          DO
            IF (j > y) EXIT
              arrC(k) = arrB(j)
              j = j + 1
              k = k + 1
          END DO
        END IF
      END SUBROUTINE imerge2
      

      MOAR 数据使用 Fortran 版本。有人喜欢直线吗?

      SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do ./derrelSORT-2022.x $i; done | awk \'BEGIN {old_1=\"Derrel\"; print \"N      Time(s)\"};{if ($1 == \"Generating\") {printf $NF\" \"; old_1=$1} else if (old_1 == \"Sorting\") {print $3; old_1=$1} else {old_1=$1}}\'
      N      Time(s)
      100000 0.0000
      200000 0.0312
      400000 0.0625
      800000 0.1562
      1600000 0.2969
      3200000 0.6250
      6400000 1.3594
      12800000 2.7500
      25600000 5.5625
      51200000 11.8906
      102400000 23.3750
      204800000 47.3750
      409600000 96.4531
      

      看起来是线性的,不是吗? ;) Fortran sorting times from above plotted.

  • 接下来是黎曼猜想?......
  • 我看不出有任何理由认为您的双端合并会比标准合并更快。恰恰相反。尽管它们都应该执行非常接近相同数量的步骤,但单端(和仅向前)合并往往对缓存更友好。
  • @DJWalters 并非所有操作都在相同的时间内执行。对于n 的实际值,内存阵列上的n log n 操作很可能比SSD 上的n 写入操作快。
  • 我采用了问题中提供的 Fortran 程序,并使用 gfortran -O3(来自 GCC 套件的 8.5.0 版)未经修改地对其进行了编译。在样本大小 100,000 上运行它; 1,000,000; 10,000,000;并且 100,000,000 表现出明显的超线性缩放,排序阶段的执行时间比率(由程序报告)与 N = 100,000 的 1.00、11.6、144、1500 相比。这对于您的线性缩放假设来说看起来很糟糕,但对于 N 来说是合理的日志 N。
  • 另外,是的,我可以比这更快地排序。至少,我可以修改您的代码,以将其在大小为 100,000,000 的输入上的执行时间减少约 20%。节省时间主要来自消除大量不必要的写入,例如将无论如何都会被覆盖的存储的零初始化,以及在每次合并通过后将 arrB 复制回 arrA 而不是合并它回到另一个方向。使用数组切片分配而不是循环进行复制也有一些帮助,另外还有一些其他的零碎。

标签: algorithm sorting


【解决方案1】:

你的算法不是 O(n)。您计算的循环数 (nLoops) 是 log2(n)。内部循环的数量(iterMark 中的值)基本上是 n/2、n/4、n/8 等。但是段大小真的无关紧要,因为每次通过外部循环时,您都会查看每个列表中的项目。

无论你如何混淆它,你都在做 log2(n) 传递 n 个项目:O(n log n)。

您的代码是一个相当标准的合并排序,被证明是 O(n log n)。事实证明,比较排序的一般情况是 O(n log n)。当然,某些算法可以更快地对某些特定情况进行排序。相反,相同的算法具有需要 O(n^2) 的病理情况。其他比较排序(例如堆排序、归并排序)不太受项目顺序的影响。但在一般情况下,比较排序按 n log n 比较的顺序进行。有关详细说明,请参阅https://www.cs.cmu.edu/~avrim/451f11/lectures/lect0913.pdf

但不要相信我的话。您可以通过做一些简单的计时轻松地测试自己。计算排序(例如,100K 个项目)所需的时间。如果您的算法确实是 O(n),那么排序 200K 项目大约需要两倍的时间,排序 100 万个项目大约需要十倍的时间。但如果它是 O(n log n),正如我所怀疑的,那么时间会更长一些。

考虑:100K 的 log(2) 是 16.61。 200K 的 log(2) 是 17.61。所以排序 100K 项目(如果算法是 O(log n))将花费与 100K * 16.61 成正比的时间。对 200K 项目进行排序将花费与 200K * 17.71 成正比的时间。做算术:

100K * 16.61 = 1,661,000
200K * 17.61 = 3,522,000

因此,200K 个项目将花费大约 2.12 倍 (3,522,000/1,661,000) 的时间。或者,比线性算法长约 10%。

如果您仍然不确定,请抽出多达一百万件物品。如果算法是线性的,那么一百万个项目将花费 10 万个项目所用时间的 10 倍。如果是 O(n log n),则需要 12 倍的时间。

1M * 19.93 = 19,930,000
(19,930,000 / 1,661,000) = 11.9987 (call it 12)

【讨论】:

  • @DJWalters 在您的数据集中,您对每个元素进行排序的时间从 1.56e-07 秒上升到 2.4314892578125e-07 秒。这增加了约 55.9%。它略低于O(n log(n)) 的理论值 72.2%,因为您花费了一些时间来做一些事情,比如复制数据,数据呈线性增长。但是您并没有通过进行线性扩展的比较排序来打破数学定律。你真的,真的,真的没有。
  • @DJWalters 而且,不,我在评论之前没有运行代码。正如我所说,你所拥有的是一个相当标准的迭代合并排序的模糊实现。我看过几百遍,自己写了几十遍。当简短的分析告诉我我需要知道的一切时,我不需要运行代码。
  • @DJWalters 你自己的数字显示每个元素在O(n log(n)) 的预期范围内减速。如果你用你的代码来计算比较,那将显示出与理论更好的匹配。而且,无论您是否理解,您声称在线性时间内运行比较排序是不可能的。
  • 你可能不明白为什么这是不可能的,但那是你的问题,不是我的。您获得了指向cs.cmu.edu/~avrim/451f11/lectures/lect0913.pdf 的链接,其中包含不可能的证明。
  • @DJWalters 没有人说你的算法不快。我们说它不是线性的。它在实践中比n 写操作运行得更快确实表明您已经提出了一种有效的算法。它没有显示线性时间复杂度。当你接近无穷大时,数学的收敛需要发生,无论你设法运行多少固定步骤,它都不必发生。
【解决方案2】:

我的 f2py 技能不强,所以我为您的代码编写了一个纯 fortran 包装器(如果您想检查它,请在下面发布),我得到的时间是:

 n                     time (s)          0.1*n/1e6       0.1*n*log(n)/1e6*log(1e6)
              1000000  0.109375000      0.100000001      0.100000001
              2000000  0.203125000      0.200000003      0.210034326
              4000000  0.453125000      0.400000006      0.440137327
              8000000  0.937500000      0.800000012      0.920411944
             16000000   1.92187500       1.60000002       1.92109859
             32000000   4.01562500       3.20000005       4.00274658
             64000000   8.26562500       6.40000010       8.32659149
            128000000   17.0468750       12.8000002       17.2953815
            256000000   35.1406250       25.6000004       35.8751564

这……恐怕不适合您的O(n) 理论。

我的包装:

module m
contains
! Your code goes here
end module

program p
  use m
  implicit none

  integer(8) :: i,n
  real, allocatable :: real_array(:)
  integer, allocatable :: int_array(:)
  real :: start
  real :: stop

  real_array = [0]
  int_array = [0]

  write(*,*) "n                     time (s)          0.1*n/1e6       0.1*n*log(n)/1e6*log(1e6)"

  do i=0,30
    n = 2**i*1e6
    deallocate(real_array, int_array)
    allocate(real_array(n), int_array(n))
    call random_number(real_array)
    int_array = -huge(0)*real_array + 2.0*huge(0)

    call cpu_time(start)
    call isort(int_array, n)
    call cpu_time(stop)

    write(*,*) n, stop-start, 0.1*n/1.0e6, 0.1*n*log(1.0*n)/(1.0e6*log(1.0e6))
  enddo
end program

【讨论】:

  • Fortran 与 python 代码不同,它区分大小写。不幸的是,您编写的代码无法使用 Fortran 运行。要我示范吗?我也已经编写了那个程序。
  • 我可以向你保证,Fortran 是 not case sensitive。我还可以向您保证我的代码可以运行。
  • 我看不出你用它调用的程序隐藏在一个模块中。使用我的可调用例程发布文件,在其中可以复制输出。顺便说一句,在查看了我的 fortran 代码之后,我现在确实意识到不区分大小写。但是,这并没有改变我在这里看不到您调用的排序例程的事实。
【解决方案3】:

其他答案已经解释了为什么您没有线性比较排序。

我将尝试解释为什么执行时间会绝不证明时间复杂度。

很多时候,你可以想出一些特定的情况和一个算法,它使用各种特定于 CPU 的优化来完成它的工作(无论该工作是排序还是其他)根据一个情节比O(n) 更好:如果@987654323 的时间@ items 是y,那么根据图表2x items 的时间小于2y。这可能发生在尽可能大的x 上,因为你可以放入内存中。

尽管如此,这并不能证明时间复杂度。这可能是一个时间复杂度为O(n)O(n log n) 甚至可能是O(log n)O(n*n) 的算法。

Big-Oh 表示法隐藏了描述算法执行的操作数量的各种常量,因此这样的算法可能只是带有非常小的常量的O(n log n)(如constant &lt; 1)或带有巨大的O(log n)持续的。

Big-Oh 也不关心现实生活中的一些方面,例如系统内存或磁盘空间,或者某些 CPU 执行一条指令而不是另一条指令的速度。也许您使用的操作在该 CPU 上执行得非常快。无论如何,如果你有一个O(n log n) 算法,对于足够大的n,你最终会看到这个图表看起来像一个n log n 图表。

一个真实的例子是Disjoint set data structure,它使用了一个叫做iterated logarithm的东西,它的复杂性是O(m log* n)。在实践中,log* n 将是所有实际值的 &lt;= 5,因此如果您将其绘制为实际值,您可能会认为它是 O(m) 具有很大的常数,但事实并非如此。

您可以更改算法以在每个步骤中从不同文件中读取每个数字并将其写回该文件,并完全删除您的输入数组。它不会影响其时间复杂度,但肯定会影响您所看到的执行时间测量,因为存储显然比内存慢。嗯,它们对 Big-Oh 来说都是一样的。

【讨论】:

  • 我在程序末尾发布的 Fortran 代码将数组(已排序和未排序)写入文件。写入是一个已知的 O(n) 过程。写入和排序的执行时间顺序相同,因此具有可比性。如果排序以 O(nlogn) 执行,那么随着数组长度的增长,排序的执行时间将与写入时间收敛。但是,不存在收敛。谢谢。
  • 永远不要说永远,兄弟。量子点
  • @DJWalters 这不是我要写入文件的意思。我并不是说最后将数组写入文件。我的意思是,每当您需要索引i 处的数字时,请改为从i.txt 读取,而当您需要写入位置i 时,请改为将其写入i.txt。这将使一切变得更慢,但时间复杂度将保持不变。
  • 根据定义,在这种情况下永远不会。我很好奇,什么样的论点会让你相信你所拥有的不是线性的?
【解决方案4】:

我不怀疑您的排序速度很快,而且我相信它与sort 命令行实用程序相媲美。但它是 O(N log(N)) 迭代合并排序,而不是 O(N) 排序(也不是新算法)。

观察,

  • 您的外部循环迭代 O(log(N)) 次。
  • 在每个迭代中,内部循环迭代 O(N / 2ķ) 一些时间ķ.
  • 而每次内循环迭代的主要工作就是拆分O(2ķ) 项目分成两半并将它们合并在一起,这涉及检查和移动每个项目。 (然后将它们全部移回原始数组。)这需要 O(2ķ) 每次内循环迭代的操作。

这些因素加在一起:

O(log(N)) * O(N / 2ķ) * O(2ķ)

2的因数ķ互相取消,剩下的就是 O(N log(N))。 (这ķs 是 N 的函数,因此不能简单地将它们作为常数忽略。)

对数增长非常缓慢,所以如果你不仔细看,很容易被愚弄以为你看到了线性增长,而实际上它是 N log(N)。您需要查看大范围的值才能看到超线性,其中一些确实在您的数据中可见。

至于您的绘图,曲线拟合的结果存在问题:y 截距显着为负(对于数据的规模,尤其是对于具有小y 的点的集中度)。您的数据可能很适合线性模型(如果不是完全合理的话),但它们确实似乎更适合 N log(N) 模型。

【讨论】:

  • O(nlogn) 预测大数组大小的执行时间太长。证据已公布。帖子末尾发布的程序功能齐全。请绘制图表并发布一些实际数据。谢谢。
  • @DJWalters,O(n log n) 不预测任何特定的执行时间,它预测执行时间如何输入大小,在渐近极限.算法分析,例如这里介绍的,是进行此类确定的常用和最确凿的技术。
猜你喜欢
  • 2018-05-24
  • 1970-01-01
  • 1970-01-01
  • 2017-01-30
  • 1970-01-01
  • 2012-04-28
  • 1970-01-01
  • 1970-01-01
  • 2012-11-30
相关资源
最近更新 更多