【问题标题】:Cross validation using Pyspark使用 Pyspark 进行交叉验证
【发布时间】:2021-01-19 04:47:27
【问题描述】:

我正在尝试在使用 spark 时使用交叉验证,但它会引发错误:

gbtClassifier = GBTClassifier(featuresCol= "features", labelCol="is_goal")
lr = LogisticRegression(featuresCol= "features" ,labelCol="is_goal")
pipelineStages = stringIndexers + encoders + [featureAssembler]
pipeline = Pipeline(stages=pipelineStages)

param_grid_lr = ParamGridBuilder().addGrid(lr.regParam, [0.1,0.01]).addGrid(lr.elasticNetParam, [0,0.5,1]).build()

crossval = CrossValidator(estimator=lr, estimatorParamMaps=param_grid_lr ,evaluator=BinaryClassificationEvaluator(), numFolds=3)

cross_model = crossval.fit(df_tr)

IllegalArgumentException:标签不存在。 Available: event_type_str, event_team, shot_place_str, location_str, assist_method_str, situation_str, country_code, is_goal, event_type_str_idx, event_team_idx, shot_place_str_idx, location_str_idx, assist_method_str_idx, situation_str_idx, country_code_idx, event_type_str_vec, event_team_vec, shot_place_str_vec, location_str_vec, assist_method_str_vec, situation_str_vec, country_code_vec, features, CrossValidator_2fc516202d9d_rand, rawPrediction,概率,预测

[这是我的特征的样子1

【问题讨论】:

    标签: apache-spark pyspark


    【解决方案1】:

    您的 BinaryClassificationEvaluator 默认情况下期望标签列称为 label ,您可以从文档 https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.BinaryClassificationEvaluator 中看到。 您需要根据数据框中给出的列指定rawPredictionCollabelCol

    【讨论】:

    • 谢谢!它起作用了,我将“is_goal”列重命名为标签,它起作用了
    猜你喜欢
    • 2023-03-17
    • 2015-06-11
    • 2020-07-13
    • 2019-09-06
    • 2017-04-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-24
    相关资源
    最近更新 更多