如何在 apache spark mllib 中使用 python 在逻辑回归中设置优化器答案

【问题标题】：How to set optimizer in logistic regression in the apache spark mllib with python如何在 apache spark mllib 中使用 python 在逻辑回归中设置优化器
【发布时间】：2014-06-10 22:31:16
【问题描述】：

我现在开始对 apache spark mllib 进行一些测试

def mapper(line):
    feats = line.strip().split(',')
    label = feats[len(feats)-1]
    feats = feats[:len(feats)-1]
    feats.insert(0,label)
    return numpy.array([float(feature) for feature in feats])

def test3():
    data = sc.textFile('/home/helxsz/Dropbox/exercise/spark/data_banknote_authentication.txt')
    parsed = data.map(mapper)
    logistic = LogisticRegressionWithSGD()
    logistic.optimizer.setNumIterations(200).setMiniBatchFraction(0.1)
    model = logistic.run(parsed)
    labelsAndPreds = parsed.map(lambda points: (int(points[0]), model.predict( points[1:len(points)]) ))
    trainErr = labelAndPreds.filter(lambda (v,p): v != p).count() / float(parsed.count())
    print 'training error = ' + str(trainErr)

但是当我使用如下 LogisticRegressionWithSGD 时

logistic = LogisticRegressionWithSGD()
logistic.optimizer.setNumIterations(200).setMiniBatchFraction(0.1)

它给出了一个错误，即 AttributeError: 'LogisticRegressionWithSGD' object has no attribute 'optimizer'

这是 LogisticRegressionWithSGD 和 GradientDescent 的 API 文档

【问题讨论】：

标签： python bigdata apache-spark

【解决方案1】：

在 python API 中，您可以在调用 'train' 时设置这些参数：

model = LogisticRegressionWithSGD.train(parsed, iterations=200, miniBatchFraction=0.1)

我能找到的唯一文档是source code

【讨论】：