【问题标题】:Given the runtime data, how to know if sorting program uses bubble sort or insertion sort?给定运行时数据,如何知道排序程序是使用冒泡排序还是插入排序?
【发布时间】:2020-11-10 15:42:12
【问题描述】:

我测量了一个排序程序/算法,并根据运行时数据,我将其缩小为两种排序算法——冒泡排序和插入排序。

有没有办法确定它是哪一个?当然不知道代码。

它们都具有相同的时间复杂度,我没有想法。

时间复杂度数据:

  • 排序:O(n) 1000 个数字所用的时间 = 0.0047 秒
  • 随机未排序:O(n^2) 1000 个数字所用的时间 = 0.021 秒
  • 降序:O(n^2) 1000 个数字所用的时间 = 0.04 秒

提前致谢!

【问题讨论】:

  • 您遗漏了重要信息:在不知道您的机器有多快的情况下,这些数字并不是很有帮助,因为它们太“绝对”了。如果我告诉你我在 15 分钟内从 A 点走到 B 点,从 A 到 B 的距离是 1.5 公里,你能猜出我是不是在面包店停下来买面包?不,因为你没有足够的信息。至少我需要告诉你我是步行、开车还是坐公共汽车,以及以什么速度。
  • 尝试使用不同大小(100、1000、10000、50000 个元素)的列表,并将结果与​​相同列表上的插入排序和冒泡排序的结果进行比较会很有帮助。
  • 我不知道你对运行时环境有什么样的控制,但如果这是一个 C 函数,也许你可以安排在一个无效的数组上调用它,其后面的值没有映射,然后检查核心转储以查看数组的映射前缀是否完全排序(插入排序)或未排序(冒泡排序)。
  • 这个想法的一个不太激进的版本是在一个巨大的数组上运行算法并随着时间的推移绘制缓存未命中。冒泡排序会比插入排序更快。
  • 所以除了 some 冒泡排序变体或 some 插入排序变体之外,您对代码一无所知

标签: algorithm sorting


【解决方案1】:
  1. 您的 1000 个排序元素太少了

    测量的时间太短而不能代表有效的测量(因为大部分时间可能不会被排序本身使用,而是初始化窗口、打开文件等......)。

    您需要至少 100 毫秒或更长的时间(1 秒是理想的)。

  2. 如果您有权访问正在排序的数据

    您可以引入对每种类型的排序都具有挑战性的数据集(并且从时间推断使用的算法)...例如,冒泡排序对于以相反顺序排序的数组来说是最慢的...所以通过升序和降序和随机并比较时间。让我们调用时间tasc,tdes,trnd 并假设升序排序,那么如果涉及冒泡排序,它应该是:

    tasc O(n) < trnd  < tdes O(n^2)
    

    所以:

    tasc*n == tdes + margin_of error
    

    所以只需测试 tdes/tasc 是否接近 n ... 有一定的误差余量 ...

    因此您只需要创建一个样本数据,该样本数据对于特定类型的排序而不是其他类型的排序是困难的......并且从时间开始检测是否是这种情况,直到您发现使用了算法。

    这里有一些数据(所有时间都在[ms])我测试了我的冒泡排序和升序数据:

       n     tasc    tdesc    tasc*n
    1000  0.00321  2.96147  3.205750
    2000  0.00609 11.76799 12.181855
    4000  0.01186 45.58834 47.445111
    

    如果我们有复杂性的运行时更清楚O(n)

    t(O(n)) = c*n
    

    转换为复杂的运行时O(n^2)(假设相同的恒定时间c):

    t(O(n^2)) = c*n*n = t(O(n)) * n
    

    通过这种方式,您可以比较不同复杂度的时间,您只需将所有测量时间转换为单个常见复杂度...

  3. 如果您可以选择排序数据大小

    然后正如 cmets 中提到的那样,您可以推断出随着 n 增加(加倍)的时代增长率,您可以估计复杂性并从中判断使用了哪种算法。

    假设从 #2 开始测量时间,那么对于 O(n),恒定时间 c 应该与 tasc (O(n)) 相同:

       n     tasc    c=tasc/n
    1000  0.00321 0.000003210
    2000  0.00609 0.000003045 
    4000  0.01186 0.000002965 
    

    对于 tdesc (O(n^2)):

       n     tdesc        tdesc/n^2
    1000   2.96147 0.00000296147000
    2000  11.76799 0.00000294199750
    4000  45.58834 0.00000284927125
    

    如您所见,c 两次tasc,tdesc 或多或少相同,这意味着它们符合估计的复杂性O(n),O(n^2)

但是,如果不知道被测试的应用程序做了什么,则很难确定,因为排序之前可能会进行处理……例如,可能会扫描数据以检测可行的表单(已排序、随机、几乎已排序……)在O(n) 中,结果以及数据大小可能会选择使用哪种排序算法...因此您的测量可能会测量不同的例程,从而使结果无效...

[edit1] 我有一个疯狂的想法,即自动检测复杂性

只需测试所有测量时间与它们对应的n之间的常数时间常数是否大致相同...这里是简单的C++/VCL代码:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
double factorial[]= // n[-],t[ms]
    {
     11,0.008,
     12,0.012,
     13,0.013,
     14,0.014,
     15,0.016,
     16,0.014,
     17,0.015,
     18,0.017,
     19,0.019,
     20,0.016,
     21,0.017,
     22,0.019,
     23,0.021,
     24,0.023,
     25,0.025,
     26,0.027,
     27,0.029,
     28,0.032,
     29,0.034,
     30,0.037,
     31,0.039,
     32,0.034,
     33,0.037,
     34,0.039,
     35,0.041,
     36,0.039,
     37,0.041,
     38,0.044,
     39,0.046,
     40,0.041,
     41,0.044,
     42,0.046,
     43,0.049,
     44,0.048,
     45,0.050,
     46,0.054,
     47,0.056,
     48,0.056,
     49,0.060,
     50,0.063,
     51,0.066,
     52,0.065,
     53,0.069,
     54,0.072,
     55,0.076,
     56,0.077,
     57,0.162,
     58,0.095,
     59,0.093,
     60,0.089,
     61,0.093,
     62,0.098,
     63,0.096,
     64,0.090,
     65,0.100,
     66,0.104,
     67,0.111,
     68,0.100,
     69,0.121,
     70,0.109,
     71,0.119,
     72,0.104,
     73,0.124,
     74,0.113,
     75,0.118,
     76,0.118,
     77,0.123,
     78,0.129,
     79,0.133,
     80,0.121,
     81,0.119,
     82,0.131,
     83,0.150,
     84,0.141,
     85,0.148,
     86,0.154,
     87,0.163,
     88,0.211,
     89,0.151,
     90,0.157,
     91,0.166,
     92,0.161,
     93,0.169,
     94,0.173,
     95,0.188,
     96,0.181,
     97,0.187,
     98,0.194,
     99,0.201,
    100,0.185,
    101,0.191,
    102,0.202,
    103,0.207,
    104,0.242,
    105,0.210,
    106,0.215,
    107,0.221,
    108,0.217,
    109,0.226,
    110,0.232,
    111,0.240,
    112,0.213,
    113,0.231,
    114,0.240,
    115,0.252,
    116,0.248,
    117,0.598,
    118,0.259,
    119,0.261,
    120,0.254,
    121,0.263,
    122,0.270,
    123,0.281,
    124,0.290,
    125,0.322,
    126,0.303,
    127,0.313,
    128,0.307,
      0,0.000
    };
//---------------------------------------------------------------------------
double sort_asc[]=
    {
    1000,0.00321,
    2000,0.00609,
    4000,0.01186,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_desc[]=
    {
    1000, 2.96147,
    2000,11.76799,
    4000,45.58834,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_rand[]=
    {
    1000, 3.205750,
    2000,12.181855,
    4000,47.445111,
       0,0.000
    };
//---------------------------------------------------------------------------
double div(double a,double b){ return (fabs(b)>1e-10)?a/b:0.0; }
//---------------------------------------------------------------------------
AnsiString get_complexity(double *dat)  // expect dat[] = { n0,t(n0), n1,t(n1), ... , 0,0 }
    {
    AnsiString O="O(?)";
    int i,e;
    double t,n,c,c0,c1,a,dc=1e+10;
    #define testbeg for (e=1,i=0;dat[i]>0.5;){ n=dat[i]; i++; t=dat[i]; i++;
    #define testend(s) if ((c<=0.0)||(n<2.0)) continue; if (e){ e=0; c0=c; c1=c; } if (c0>c) c0=c; if (c1<c) c1=c; } a=fabs(1.0-div(c0,c1)); if (dc>=a){ dc=a; O=s; }


    testbeg;            c=div(t,n);                 testend("O(n)");
    testbeg;            c=div(t,n*n);               testend("O(n^2)");
    testbeg;            c=div(t,n*n*n);             testend("O(n^3)");
    testbeg;            c=div(t,n*n*n*n);           testend("O(n^4)");
    testbeg; a=log(n);  c=div(t,a);                 testend("O(log(n))");
    testbeg; a=log(n);  c=div(t,a*a);               testend("O(log^2(n))");
    testbeg; a=log(n);  c=div(t,a*a*a);             testend("O(log^3(n))");
    testbeg; a=log(n);  c=div(t,a*a*a*a);           testend("O(log^4(n))");
    testbeg; a=log(n);  c=div(t,n*a);               testend("O(n.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*a);             testend("O(n^2.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a);           testend("O(n^3.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a);         testend("O(n^4.log(n))");
    testbeg; a=log(n);  c=div(t,n*a*a);             testend("O(n.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a);           testend("O(n^2.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a);         testend("O(n^3.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a);       testend("O(n^4.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a);           testend("O(n.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a);         testend("O(n^2.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a);       testend("O(n^3.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a);     testend("O(n^4.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a*a);         testend("O(n.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a*a);       testend("O(n^2.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a*a);     testend("O(n^3.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a*a);   testend("O(n^4.log^4(n))");

    #undef testend
    #undef testbeg
    return O+AnsiString().sprintf(" error = %.6lf",dc);
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
    {
    mm_log->Lines->Clear();
    mm_log->Lines->Add("factorial "+get_complexity(factorial));
    mm_log->Lines->Add("sort asc  "+get_complexity(sort_asc));
    mm_log->Lines->Add("sort desc "+get_complexity(sort_desc));
    mm_log->Lines->Add("sort rand "+get_complexity(sort_rand));
    }
//-------------------------------------------------------------------------

与我的相关时间测量 fast exact bigint factorial 我只使用了 8 毫秒以上的较大时间,以及上面输出的排序测量:

factorial O(n.log^2(n)) error = 0.665782
sort asc  O(n) error = 0.076324
sort desc O(n^2) error = 0.037886
sort rand O(n^2) error = 0.075000

该代码仅测试少数支持的复杂性并输出具有最低错误的一个(c 不同n 之间的恒定时间的变化)...

只需忽略 VCL 内容并将 AnsiString 转换为您想要的任何字符串或输出...

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-04-25
    • 2012-05-12
    • 2013-06-20
    • 2019-04-21
    • 1970-01-01
    相关资源
    最近更新 更多