【问题标题】:ValueError: array must not contain infs or NaNs during BiclusteringValueError:数组在双聚类期间不得包含 infs 或 NaN
【发布时间】:2016-06-16 21:23:41
【问题描述】:

我正在尝试对双聚类进行建模,但它失败了,因为它说数组包含 infsnans,尽管我使用 pd.isnull(DataFile).sum() 扫描了数组

 import pandas as pd
 import numpy as np
 from matplotlib import pyplot as plt
 from sklearn.datasets import samples_generator as sg
 from sklearn.cluster.bicluster import SpectralCoclustering
 from sklearn.metrics import consensus_score
 DataFile=pd.read_csv("DatafilledProp.csv",sep='\t')


 DataFile.drop(DataFile.columns[[0, 1]], axis=1, inplace=True)
 plt.matshow(DataFile.as_matrix(), cmap=plt.cm.Blues)
 plt.title("Original TransMapping")
 data, row_idx, col_idx = sg._shuffle(DataFile.as_matrix(), random_state=0)
 plt.matshow(data, cmap=plt.cm.Blues)
 plt.title("Shuffled dataset")
 plt.show()
 Features=DataFile.values
 model = SpectralCoclustering(n_clusters=10, random_state=0)
 model.fit(Features)

这是我得到的错误:

File "C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensio
ns\Microsoft\Python Tools for Visual Studio\2.1\visualstudio_py_util.py", line 1 06, in exec_file
exec_code(code, file, global_variables)
       File "C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensio
     ns\Microsoft\Python Tools for Visual Studio\2.1\visualstudio_py_util.py", line 8
     2, in exec_code
         exec(code_obj, global_variables)
       File "D:\ClusteringDemo\DataPreparation.py\DataPreparation.py\Model.py", line
     19, in <module>
         model.fit(Features)
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \sklearn\cluster\bicluster\spectral.py", line 126, in fit
         self._fit(X)
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \sklearn\cluster\bicluster\spectral.py", line 275, in _fit
         u, v = self._svd(normalized_data, n_sv, n_discard=1)
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \sklearn\cluster\bicluster\spectral.py", line 139, in _svd
         **kwargs)
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \sklearn\utils\extmath.py", line 299, in randomized_svd
         Q = randomized_range_finder(M, n_random, n_iter, random_state)
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \sklearn\utils\extmath.py", line 226, in randomized_range_finder
         Q, R = linalg.qr(Y, mode='economic')
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \scipy\linalg\decomp_qr.py", line 127, in qr
         a1 = numpy.asarray_chkfinite(a)
       File "C:\Users\vinay.sawant\AppData\Local\Continuum\Anaconda\lib\site-packages
     \numpy\lib\function_base.py", line 613, in asarray_chkfinite
         "array must not contain infs or NaNs")
     ValueError: array must not contain infs or NaNs
     Press any key to continue .

【问题讨论】:

    标签: python arrays machine-learning scikit-learn


    【解决方案1】:

    已在此处回答:https://stackoverflow.com/a/42764378/2649309

    这可能是 scikit-learn 0.18.1 中 PCA 实现的问题。

    查看错误报告 https://github.com/scikit-learn/scikit-learn/issues/7568

    描述的解决方法是使用带有 svd_solver='full' 的 PCA。所以试试这个 代码:

    pipe = [('pca',PCA(whiten=True,svd_solver='full')),
       ('clf' ,lm)]
    

    我能够解决这个问题。

    【讨论】:

      【解决方案2】:

      pd.isnull(DataFile).sum() 仅检查 NaN 值,例如:

      import pandas as pd
      
      df = pd.DataFrame([[1,2],[3,4],[np.NaN,6]])
      
      df
      Out[12]: 
          0  1
      0   1  2
      1   3  4
      2 NaN  6
      
      pd.isnull(df).sum()
      Out[13]: 
      0    1
      1    0
      dtype: int64
      

      但它不会检查inf,根据错误是有可能的。

      df3 = pd.DataFrame([[1,2],[3,4],[np.inf,6]])
      
      pd.isnull(df3).sum()
      Out[23]: 
      0    0
      1    0
      dtype: int64
      

      因此,我怀疑错误是inf 而不是NaN

      import numpy as np
      
      np.isinf(df3).sum()
      Out[25]: 
      0    1
      1    0
      dtype: int64
      

      【讨论】:

        猜你喜欢
        • 2016-01-31
        • 2017-05-04
        • 2021-07-07
        • 2016-06-20
        • 2016-03-09
        • 2019-10-31
        • 2013-10-13
        • 1970-01-01
        • 2021-03-17
        相关资源
        最近更新 更多