Python Scikit 随机森林 pred_proba 输出四舍五入值答案

【问题标题】：Python Scikit Random forest pred_proba outputs rounded off valuesPython Scikit 随机森林 pred_proba 输出四舍五入值
【发布时间】：2015-09-17 10:03:06
【问题描述】：

我在 scikit learn 中使用随机森林进行分类和获取类概率，我使用了 pred_proba 函数。但它输出的概率四舍五入到小数点后一位

我尝试使用示例虹膜数据集

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Categorical(iris.target, iris.target_names)
df.head()

train, test = df[df['is_train']==True], df[df['is_train']==False]

features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['species'])
clf.fit(train[features], y)
clf.predict_proba(train[features])

输出概率

   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  0.8,  0.2],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],

它是默认输出吗？可以增加小数位数吗？

注意： 找到了解决方案。默认编号树的数量 = 10，在增加数量之后。树的数量增加到一百，概率的精度就提高了。

【问题讨论】：

标签： python machine-learning scikit-learn random-forest

【解决方案1】：

显然有十棵树的默认设置，您在代码中使用默认设置：

Parameters: 
n_estimators : integer, optional (default=10)
The number of trees in the forest.

尝试这样的事情，将树的数量增加到 25 或大于 10 的数量：

RandomForestClassifier(n_estimators=25, n_jobs=2)

如果您只是在 10 个默认树中获得投票比例，这很可能会导致您看到的概率

您可能会遇到问题，因为 iris 数据集非常小。如果我没记错的话，不到 200 次观察。

predict.proba() 的文档内容如下：

The predicted class probabilities of an input sample is computed as the
mean predicted class probabilities of the trees in the forest. The class
probability of a single tree is the fraction of samples of the same 
class in a leaf.

我在文档中找不到任何参数来调整预测概率的小数精度。

【讨论】：