NotFittedError：未拟合估计器，在利用模型之前调用“拟合”答案

【问题标题】：NotFittedError: Estimator not fitted, call `fit` before exploiting the modelNotFittedError：未拟合估计器，在利用模型之前调用“拟合”
【发布时间】：2017-04-17 16:17:57
【问题描述】：

我在 Macbook OSX 10.2.1 (Sierra) 上运行 Python 3.5.2。

在尝试从 Kaggle 为 Titanic 数据集运行一些代码时，我不断收到以下错误：

NotFittedError Traceback（最近调用最后）在（） 6 7 # 使用测试集进行预测并打印出来。 ----> 8 my_prediction = my_tree_one.predict(test_features) 9 打印（我的预测） 10

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/tree/tree.py 在预测（自我，X，check_input）第429章 430 --> 431 X = self._validate_X_predict(X, check_input) 第432章 433 n_samples = X.shape[0]

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/tree/tree.py 在 _validate_X_predict(self, X, check_input) 第386章 387 如果 self.tree_ 为无： --> 388 raise NotFittedError("Estimator not fit, " 389 "在利用模型之前致电fit。") 390

NotFittedError: Estimator not fit, 在利用之前调用fit 型号。

有问题的代码似乎是这样的：

# Impute the missing value with the median
test.Fare[152] = test.Fare.median()

# Extract the features from the test set: Pclass, Sex, Age, and Fare.
test_features = test[["Pclass", "Sex", "Age", "Fare"]].values

# Make your prediction using the test set and print them.
my_prediction = my_tree_one.predict(test_features)
print(my_prediction)

# Create a data frame with two columns: PassengerId & Survived. Survived contains your predictions
PassengerId =np.array(test["PassengerId"]).astype(int)
my_solution = pd.DataFrame(my_prediction, PassengerId, columns = ["Survived"])
print(my_solution)

# Check that your data frame has 418 entries
print(my_solution.shape)

# Write your solution to a csv file with the name my_solution.csv
my_solution.to_csv("my_solution_one.csv", index_label = ["PassengerId"])

这里是code 其余部分的链接。

因为我已经调用了“fit”函数，所以我无法理解这个错误信息。我哪里错了？感谢您的宝贵时间。

编辑：原来问题是继承自上一个代码块。

# Fit your first decision tree: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)

# Look at the importance and score of the included features
print(my_tree_one.feature_importances_)
print(my_tree_one.score(features_one, target))

用这条线： my_tree_one = my_tree_one.fit(features_one, target)

产生错误：

ValueError: 输入包含 NaN、无穷大或一个太大的值 dtype('float32').

【问题讨论】：

你是直接运行整个文件gist.github.com/jarasandh/8df831a4d5c908888e9eb8a2e3851546吗？这看起来像是您在使用交互式解释器时遇到的错误。
确实如此。事实证明，错误是由于前面的代码块造成的。只需使用新发现更新原始帖子。

标签： python machine-learning scikit-learn

【解决方案1】：

错误是不言自明的：features_one 或 target 数组确实包含 NaNs 或无限值，因此估计器无法拟合，因此您不能在以后将其用于预测。

在拟合之前检查这些数组并相应地处理 NaN 值。

【讨论】：