在 sklearn 估计器上调用 set_params() 时出现 super() 错误答案

【问题标题】：Error with super() when calling set_params() on sklearn estimators在 sklearn 估计器上调用 set_params() 时出现 super() 错误
【发布时间】：2017-12-18 20:22:29
【问题描述】：

我正在尝试基于配置文件加载和配置 scikit-learn 估算器。该文件具有估计器类路径和名称以及参数字典。我的计划是使用 pydoc.locate() 加载带有默认参数的估计器，然后使用参数的字典在估计器上调用 set_params()。但我收到以下错误：

import pydoc
sgd = pydoc.locate('sklearn.linear_model.SGDClassifier')
print('{} {}'.format(type(sgd), sgd))
p_sgd = {'alpha':.1234}
sgd.set_params(p_sgd)
<class 'abc.ABCMeta'> <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'>
Traceback (most recent call last):
  File "<input>", line 5, in <module>
  File "/Users/doug/.pyenv/versions/learning-3.4.3/lib/python3.4/site-packages/sklearn/linear_model/stochastic_gradient.py", line 83, in set_params
    super(BaseSGD, self).set_params(*args, **kwargs)
TypeError: super(type, obj): obj must be an instance or subtype of type

我尝试使用相同的“加载和设置”方法两次。第一次，我按名称加载文本矢量化器并设置其参数。文本向量化器是我基于 HashingVectorizer 创建的子类。它不会产生此错误，但似乎也不会因调用 set_params() 而改变（即参数值保持默认值）。第二次是针对具有我描述的行为的分类器。

我以前在提供给 GridSearchCV 的管道中运行估计器时使用 pydoc.locate() 来加载估计器。那工作得很好。在这种情况下，我使用默认的估计器构造函数构造管道，然后 GridSearchCV 让管道在遍历参数网格时在每个估计器上调用 set_params()。查看 Pipeline 和 GridSearchCV 源代码，看起来他们对 set_params() 的使用被称为 set_params(**param_dict)。如果我尝试这样做，我会得到一个不同的错误。

import pydoc
sgd = pydoc.locate('sklearn.linear_model.SGDClassifier')
p_sgd = {'alpha':.1234}
sgd.set_params(**p_sgd)
Traceback (most recent call last):
  File "<input>", line 4, in <module>
TypeError: set_params() missing 1 required positional argument: 'self'

最后一点，我读到原始错误 (TypeError: super(type, obj)...) 已被追踪到多次加载模块的问题。事实上，在这些尝试调用之前，我确实使用了 pydoc.locate() （以便追踪他们的出身并找出谁是矢量化器与分类器）。我也许可以解决这个问题，但由于我正在循环运行以基于配置文件训练多个模型，因此仍然会事先尝试加载这些模块。

我正在使用 Python 3.4

【问题讨论】：

pydoc.locate 为您提供 SGDClassifier 类，而不是实际的分类器。您正在尝试将其用作分类器。
另外，pydoc.locate 不是公开的、记录在案的 API，您不应该使用它。你可以做from sklearn.linear_model import SGDClassifier。
哇！我知道了。我通过调用返回的类作为构造函数来修复它。我之前看过，但是在将参数 dict 传递给构造函数时尝试过。但是估计器在其初始化时没有 kwargs 选项，因此它失败了。以下作品。 import pydoc sgd = pydoc.locate('sklearn.linear_model.SGDClassifier')() p_sgd = {'alpha':.1234} sgd.set_params(**p_sgd) sgd SGDClassifier(alpha=0.1234, average=False, class_weight=None , epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15,...
我认识到这不是加载类的最佳方式，但我不能在代码中明确地这样做，因为我不知道将使用哪些分类器。它们在外部和动态配置文件中指定。

标签： python scikit-learn python-3.4

【解决方案1】：

正如 user2357112 指出的那样，我错误地只加载了类，而不是构造它。我更改了代码以在没有参数的情况下调用返回的类的构造函数，然后使用我期望的 ** 参数语法调用 set_params(**p_sgd)。

import pydoc
sgd = pydoc.locate('sklearn.linear_model.SGDClassifier')()
p_sgd = {'alpha':.1234}
sgd.set_params(**p_sgd)
sgd
SGDClassifier(alpha=0.1234, average=False, class_weight=None, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1, penalty='l2', power_t=0.5, random_state=None, shuffle=True, verbose=0, warm_start=False)

【讨论】：