【发布时间】:2014-05-31 20:23:04
【问题描述】:
我正在尝试在 mnist 手写数字数据集上运行 scikit 学习随机森林算法。在算法训练期间,系统进入内存错误。请告诉我应该怎么做才能解决这个问题。
CPU 统计数据: Intel Core 2 Duo with 4GB RAM
数据集的形状是60000, 784。 linux终端上的完整错误如下:
> File "./reducer.py", line 53, in <module>
> main() File "./reducer.py", line 38, in main
> clf = clf.fit(data,labels) #training the algorithm File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 202,
> in fit
> for i in xrange(n_jobs)) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 409, in
> __call__
> self.dispatch(function, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 295, in
> dispatch
> job = ImmediateApply(func, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 101, in
> __init__
> self.results = func(*args, **kwargs) File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 73, in
> _parallel_build_trees
> sample_mask=sample_mask, X_argsorted=X_argsorted) File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 476, in fit
> X_argsorted=X_argsorted) File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 357, in
> _build_tree
> np.argsort(X.T, axis=1).astype(np.int32).T) File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line
> 680, in argsort
> return argsort(axis, kind, order) MemoryError
【问题讨论】:
-
你使用什么参数来创建 RandomForest?
-
我只是在使用 n_estimators=10
-
尝试设置
n_jobs=1
标签: python python-2.7 scikit-learn