【问题标题】:How to get log-likelihood for each iteration in sklearn GMM?如何在sklearn GMM中获得每次迭代的对数似然?
【发布时间】:2021-03-10 15:05:11
【问题描述】:

我正在尝试在 sklearn 中拟合 GMM,并且我看到模型在第 3 个时期左右收敛,但我似乎无法访问在每个时期计算的对数似然分数。

from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=4, tol=1e-8).fit(data)

有没有办法以某种方式访问​​每个时期的对数似然分数?

【问题讨论】:

    标签: python scikit-learn cluster-analysis data-analysis gmm


    【解决方案1】:

    如果你只想看loglik分数,你可以设置verbose=2打印loglik的变化,verbose_interval=1捕捉每一步的变化:

    from sklearn.mixture import GaussianMixture
    gmm = GaussianMixture(n_components=3, tol=1e-8,verbose=2,verbose_interval=1)
    gmm.fit(data)
    
    Initialization 0
      Iteration 1    time lapse 0.00560s     ll change inf
      Iteration 2    time lapse 0.00134s     ll change 0.03655
      Iteration 3    time lapse 0.00119s     ll change 0.00867
      Iteration 4    time lapse 0.00118s     ll change 0.00619
      Iteration 5    time lapse 0.00116s     ll change 0.00612
      Iteration 6    time lapse 0.00125s     ll change 0.00647
      Iteration 7    time lapse 0.00128s     ll change 0.00700
      Iteration 8    time lapse 0.00127s     ll change 0.00727
      Iteration 9    time lapse 0.00126s     ll change 0.00673
      Iteration 10   time lapse 0.00117s     ll change 0.00604
      Iteration 11   time lapse 0.00109s     ll change 0.00530
      Iteration 12   time lapse 0.00125s     ll change 0.00431
      Iteration 13   time lapse 0.00121s     ll change 0.00366
      Iteration 14   time lapse 0.00123s     ll change 0.00404
      Iteration 15   time lapse 0.00130s     ll change 0.00361
      Iteration 16   time lapse 0.00118s     ll change 0.00157
      Iteration 17   time lapse 0.00124s     ll change 0.00048
      Iteration 18   time lapse 0.00126s     ll change 0.00015
      Iteration 19   time lapse 0.00115s     ll change 0.00005
      Iteration 20   time lapse 0.00116s     ll change 0.00001
      Iteration 21   time lapse 0.00124s     ll change 0.00000
      Iteration 22   time lapse 0.00122s     ll change 0.00000
      Iteration 23   time lapse 0.00142s     ll change 0.00000
      Iteration 24   time lapse 0.00126s     ll change 0.00000
      Iteration 25   time lapse 0.00124s     ll change 0.00000
      Iteration 26   time lapse 0.00122s     ll change 0.00000
      Iteration 27   time lapse 0.00120s     ll change 0.00000
    Initialization converged: True   time lapse 0.03765s     ll -1.20124
    

    要实际捕获此值,具体取决于您使用的内容,您可以使用 logging 将其写入日志,或者例如在下面的 jupyter 笔记本中,这可能有效:

    %%capture cap --no-stderr
    gmm.fit(data)
    

    然后我们将其传递到数据帧中并尝试反向计算似然度:

    res = pd.DataFrame([i.split() for i in cap.stdout.split("\n")]).iloc[:,[1,7]]
    res.columns = ['iteration','change']
    res.change = res.change.astype('float64')
    res = res[np.isfinite(res.change)]
    res['logLik'] = res['change'].values[-1]
    res.loc[:len(res),['logLik']] = -res.change[:-1][::-1].cumsum()[::-1] + res.change.values[-1]
    res
    
    
        iteration   change  logLik
    2   2   0.03655 -1.31546
    3   3   0.00867 -1.27891
    4   4   0.00619 -1.27024
    5   5   0.00612 -1.26405
    6   6   0.00647 -1.25793
    7   7   0.00700 -1.25146
    8   8   0.00727 -1.24446
    9   9   0.00673 -1.23719
    10  10  0.00604 -1.23046
    11  11  0.00530 -1.22442
    12  12  0.00431 -1.21912
    13  13  0.00366 -1.21481
    14  14  0.00404 -1.21115
    15  15  0.00361 -1.20711
    16  16  0.00157 -1.20350
    17  17  0.00048 -1.20193
    18  18  0.00015 -1.20145
    19  19  0.00005 -1.20130
    20  20  0.00001 -1.20125
    21  21  0.00000 -1.20124
    22  22  0.00000 -1.20124
    23  23  0.00000 -1.20124
    24  24  0.00000 -1.20124
    25  25  0.00000 -1.20124
    26  26  0.00000 -1.20124
    27  27  0.00000 -1.20124
    28  converged:  -1.20124    -1.20124
    

    【讨论】:

      猜你喜欢
      • 2018-06-19
      • 1970-01-01
      • 2017-05-04
      • 2022-08-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-02-01
      • 2018-05-04
      相关资源
      最近更新 更多