sklearn 文档中的类数组形状 (n_samples,) vs [n_samples]

【问题标题】：array-like shape (n_samples,) vs [n_samples] in sklearn documentssklearn 文档中的类数组形状 (n_samples,) vs [n_samples]
【发布时间】：2018-05-05 09:54:33
【问题描述】：

对于sample_weight，其形状要求是类数组形状(n_samples,)，有时是类数组形状[n_samples]。 (n_samples,) 是否表示一维数组？和 [n_samples] 表示列表？或者它们是等价的？两种形式都可以在这里看到：http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html

【问题讨论】：

尝试调用返回C : array, shape = [n_samples] 的方法，例如predict(x)。检查type 和.shape（如果有的话）这个结果，你得到了什么？
两者是等价的。 sklearn 文档提到对象应该是数组或类似数组的。通常，这意味着一个 numpy 数组。

标签： scikit-learn notation

【解决方案1】：

你可以用一个简单的例子来测试一下：

import numpy as np
from sklearn.naive_bayes import GaussianNB

#create some data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])

#create the model and fit it
clf = GaussianNB()
clf.fit(X, Y)

#check the type of some attributes
type(clf.class_prior_)
type(clf.class_count_)

#check the shapes of these attributes
clf.class_prior_.shape
clf.class_count_

或更高级的搜索：

#verify that it is a numpy nd array and NOT a list
isinstance(clf.class_prior_, np.ndarray)
isinstance(clf.class_prior_, list)

同样，你可以检查所有的属性。

结果

numpy.ndarray

numpy.ndarray

(2,)

array([ 3.,  3.])

True

False

结果表明这些属性是numpy nd arrays。

【讨论】：