statsmodels 引发 TypeError：输入类型不支持 ufunc 'isfinite'答案

【问题标题】：statsmodels raises TypeError: ufunc 'isfinite' not supported for the input typesstatsmodels 引发 TypeError：输入类型不支持 ufunc 'isfinite'
【发布时间】：2020-02-16 06:06:04
【问题描述】：

我正在使用 statsmodels.api 应用反向消除，代码给出了这个错误 `TypeError: ufunc 'isfinite' not supported for the input types, and the input could not be safely coerced to any supported types based on cast rule' '安全''

不知道怎么解决

这里是代码

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import  train_test_split
from sklearn.preprocessing import  LabelEncoder, OneHotEncoder
from sklearn.compose import  ColumnTransformer
import statsmodels.api as smf

data = pd.read_csv('F:/Py Projects/ML_Dataset/50_Startups.csv')
dataSlice = data.head(10)

#get data column
readX = data.iloc[:,:4].values
readY = data.iloc[:,4].values

#encoding c3
transformer = ColumnTransformer(
    transformers=[("OneHot",OneHotEncoder(),[3])],
    remainder='passthrough' )
readX = transformer.fit_transform(readX.tolist())
readX = readX[:,1:]

trainX, testX, trainY, testY = train_test_split(readX,readY,test_size=0.2,random_state=0)

lreg = LinearRegression()
lreg.fit(trainX, trainY)
predY = lreg.predict(testX)

readX = np.append(arr=np.ones((50,1),dtype=np.int),values=readX,axis=1)

optimisedX = readX[:,[0,1,2,3,4,5]]
ols = smf.OLS(endog=readX, exog=optimisedX).fit()
print(ols.summary())

这是错误信息

Traceback (most recent call last):
  File "F:/Py Projects/ml/BackwardElimination.py", line 33, in <module>
    ols = smf.OLS(endog=readX, exog=optimisedX).fit()
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\regression\linear_model.py", line 838, in __init__
    hasconst=hasconst, **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\regression\linear_model.py", line 684, in __init__
    weights=weights, hasconst=hasconst, **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\regression\linear_model.py", line 196, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\model.py", line 216, in __init__
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\model.py", line 68, in __init__
    **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\model.py", line 91, in _handle_data
    data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\data.py", line 635, in handle_data
    **kwargs)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\data.py", line 80, in __init__
    self._handle_constant(hasconst)
  File "C:\Users\udit\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\data.py", line 125, in _handle_constant
    if not np.isfinite(ptp_).all():
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

【问题讨论】：

检查您的 optimisedX 是否具有类似 float64 的数字 dtype

标签： python machine-learning statsmodels sklearn-pandas

【解决方案1】：

你需要使用 numpy 将 readX 的数据类型更改为 int 或 float64。在optimisedX 初始化之前的astype( ) 函数。还将 endog 更改为 readY

readX.astype('float64')
optimisedX = readX[:,[0,1,2,3,4,5]]
ols = smf.OLS(endog=readY, exog=optimisedX).fit()
print(ols.summary())

【讨论】：

【解决方案2】：

今天我收到了同样的错误。
根本原因是将 numpy dtype object 转换为 float64 并为其分配一个新变量并在函数中使用此变量。

X[1:3]
#array([[1, 0.0, 0.0, 162597.7, 151377.59, 443898.53],
#        [1, 1.0, 0.0, 153441.51, 101145.55, 407934.54]], dtype=object)
X.dtype
#dtype('O')

X1= X.astype(np.float64)
X1[1:2]
#array([[1.0000000e+00, 0.0000000e+00, 0.0000000e+00, 1.625977e+05, 1.5137759e+05, 4.4389853e+05]])
X1.dtype
#dtype('float64')

【讨论】：

【解决方案3】：

只需添加这一行，

X_opt = X[:, [0, 1, 2, 3, 4, 5]] 
X_opt = np.array(X_opt, dtype=float) # <-- this line

将其转换为数组并更改数据类型。

【讨论】：