【发布时间】:2022-06-21 07:58:54
【问题描述】:
在 scikit-learn 0.24.0 或更高版本中,当您使用 GridSearchCV 或 RandomizedSearchCV 并设置 n_jobs=-1,设置任何详细数字(1、2、3 或 100) > 不会打印任何进度消息。但是,如果您使用 scikit-learn 0.23.2 或更低版本,一切都会按预期运行,并且 joblib 会打印进度消息。
这是一个示例代码,您可以使用它在 Google Colab 或 Jupyter Notebook 中重复我的实验:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[0.1, 1, 10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters, scoring='accuracy', refit=True, n_jobs=-1, verbose=60)
clf.fit(iris.data, iris.target)
print('Best accuracy score: %.2f' %clf.best_score_)
使用 scikit-learn 0.23.2 的结果:
Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0295s.) Setting batch_size=2.
[Parallel(n_jobs=-1)]: Done 2 out of 30 | elapsed: 0.0s remaining: 0.5s
[Parallel(n_jobs=-1)]: Done 3 out of 30 | elapsed: 0.0s remaining: 0.3s
[Parallel(n_jobs=-1)]: Done 4 out of 30 | elapsed: 0.0s remaining: 0.3s
[Parallel(n_jobs=-1)]: Done 5 out of 30 | elapsed: 0.0s remaining: 0.2s
[Parallel(n_jobs=-1)]: Done 6 out of 30 | elapsed: 0.0s remaining: 0.2s
[Parallel(n_jobs=-1)]: Done 7 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 8 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 9 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 10 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 11 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 12 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 13 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 14 out of 30 | elapsed: 0.0s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 15 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 16 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 17 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 18 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 19 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 20 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 21 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 22 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 23 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 24 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 25 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 26 out of 30 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 27 out of 30 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 28 out of 30 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 0.1s finished
Best accuracy score: 0.98
使用 scikit-learn 0.24.0 的结果(测试到 v1.0.2):
Fitting 5 folds for each of 6 candidates, totaling 30 fits
Best accuracy score: 0.98
在我看来,scikit-learn 0.24.0 或更高版本没有向joblib 发送“verbose”值,因此,当在 GridSearch 或 RandomizedSearchCV 中使用多处理器时,不会打印进度带有“loky”后端。
知道如何在 Google Colab 或 Jupyter Notebook 中解决此问题并为 sklearn 0.24.0 或更高版本打印进度日志吗?
【问题讨论】:
-
还有一个问题,你有没有找到解决方案@Ashtad? :\
-
是的。不幸的是,我仍然没有找到解决这个问题的方法。 @jtlz2
-
我发现其他人也面临同样的问题:stackoverflow.com/questions/67120754/…
-
另外,人们在这里指的是同样的问题:github.com/scikit-learn/scikit-learn/issues/22849
标签: scikit-learn jupyter-notebook google-colaboratory joblib gridsearchcv