【发布时间】:2016-07-12 23:01:18
【问题描述】:
我在python中创建了以下函数:
def cross_validate(algorithms, data, labels, cv=4, n_jobs=-1):
print "Cross validation using: "
for alg, predictors in algorithms:
print alg
print
# Compute the accuracy score for all the cross validation folds.
scores = cross_val_score(alg, data, labels, cv=cv, n_jobs=n_jobs)
# Take the mean of the scores (because we have one for each fold)
print scores
print("Cross validation mean score = " + str(scores.mean()))
name = re.split('\(', str(alg))
filename = str('%0.5f' %scores.mean()) + "_" + name[0] + ".pkl"
# We might use this another time
joblib.dump(alg, filename, compress=1, cache_size=1e9)
filenameL.append(filename)
try:
move(filename, "pkl")
except:
os.remove(filename)
print
return
我认为为了进行交叉验证,sklearn 必须适合您的功能。
但是,当我稍后尝试使用它时(f是我上面保存在joblib.dump(alg, filename, compress=1, cache_size=1e9))中的pkl文件:
alg = joblib.load(f)
predictions = alg.predict_proba(train_data[predictors]).astype(float)
我在第一行没有收到错误(所以看起来负载正在工作),但随后它在下一行告诉我 NotFittedError: Estimator not fitted, callfitbefore exploiting the model.。
我做错了什么?我不能重复使用适合计算交叉验证的模型吗?我查看了Keep the fitted parameters when using a cross_val_score in scikits learn,但要么我不明白答案,要么这不是我想要的。我想要的是用 joblib 保存整个模型,以便以后无需重新拟合即可使用它。
【问题讨论】:
标签: python scikit-learn joblib