【问题标题】:Scala: how to modify the default metric for cross validationScala:如何修改交叉验证的默认指标
【发布时间】:2019-05-15 14:31:31
【问题描述】:

我在这个网站上找到了下面的代码: https://spark.apache.org/docs/2.3.1/ml-tuning.html

// Note that the evaluator here is a BinaryClassificationEvaluator and its default metric
// is areaUnderROC.
val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(new BinaryClassificationEvaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(2)  // Use 3+ in practice
  .setParallelism(2)  // Evaluate up to 2 parameter settings in parallel

正如他们所说,BinaryClassificationEvaluator 的默认指标是“AUC”。 如何将此默认指标更改为 F1 分数?

我试过了:

// Note that the evaluator here is a BinaryClassificationEvaluator and its default metric
// is areaUnderROC.
val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(new BinaryClassificationEvaluator.setMetricName("f1"))
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(2)  // Use 3+ in practice
  .setParallelism(2)  // Evaluate up to 2 parameter settings in parallel

但是我遇到了一些错误... 我搜索了很多网站,但我没有找到解决方案...

【问题讨论】:

    标签: scala apache-spark cross-validation metrics evaluator


    【解决方案1】:

    setMetricName 只接受“areaUnderPR”或“areaUnderROC”。你需要自己写Evaluator;像这样:

    import org.apache.spark.ml.evaluation.Evaluator
    import org.apache.spark.ml.param.ParamMap
    import org.apache.spark.ml.param.shared.{HasLabelCol, HasPredictionCol}
    import org.apache.spark.ml.util.Identifiable
    import org.apache.spark.sql.types.IntegerType
    import org.apache.spark.sql.{Dataset, functions => F}
    
    class FScoreEvaluator(override val uid: String) extends Evaluator with HasPredictionCol with HasLabelCol{
    
      def this() = this(Identifiable.randomUID("FScoreEvaluator"))
    
      def evaluate(dataset: Dataset[_]): Double = {
        val truePositive = F.sum(((F.col(getLabelCol) === 1) && (F.col(getPredictionCol) === 1)).cast(IntegerType))
        val predictedPositive = F.sum((F.col(getPredictionCol) === 1).cast(IntegerType))
        val actualPositive = F.sum((F.col(getLabelCol) === 1).cast(IntegerType))
    
        val precision = truePositive / predictedPositive
        val recall = truePositive / actualPositive
        val fScore = F.lit(2) * (precision * recall) / (precision + recall)
    
        dataset.select(fScore).collect()(0)(0).asInstanceOf[Double]
      }
    
      override def copy(extra: ParamMap): Evaluator = defaultCopy(extra)
    }
    

    【讨论】:

    • 感谢您的回答。但是,这是行不通的。是 scala 还是 pyspark ......?运行您的代码时出现一些错误:“错误:未找到:类型评估器”、“错误:未找到:类型 HasPredictionCol”、“错误:未找到:值 F”....
    • @Anneso 这是 Scala。您是否运行了import 语句?另外,您使用的是什么版本的 Spark?听起来像
    • 我使用的是 Spark 的 2.1.1.2.6.1.0-129 版本和 scala 的 2.11.8 版本。是的,我确实运行了导入语句,并且在执行此操作时没有收到任何错误...
    • 是的,Spark
    • 我不确定是否理解您的最后评论。对我的 Spark 版本有什么想法吗?
    【解决方案2】:

    基于@gmds 的回答。确保 Spark 版本 >=2.3。

    您也可以关注the implementation of RegressionEvaluator in Spark 来实现其他自定义评估器。

    我还添加了isLargerBetter,以便实例化的评估器可以用于模型选择(例如CV)

    import org.apache.spark.ml.evaluation.Evaluator
    import org.apache.spark.ml.param.ParamMap
    import org.apache.spark.ml.param.shared.{HasLabelCol, HasPredictionCol, HasWeightCol}
    import org.apache.spark.ml.util.Identifiable
    import org.apache.spark.sql.types.IntegerType
    import org.apache.spark.sql.{Dataset, functions => F}
    
    class WRmseEvaluator(override val uid: String) extends Evaluator with HasPredictionCol with HasLabelCol with HasWeightCol {
    
        def this() = this(Identifiable.randomUID("wrmseEval"))
    
        def setPredictionCol(value: String): this.type = set(predictionCol, value)
        
        def setLabelCol(value: String): this.type = set(labelCol, value)
        
        def setWeightCol(value: String): this.type = set(weightCol, value)
        
        def evaluate(dataset: Dataset[_]): Double = {
            dataset
                .withColumn("residual", F.col(getLabelCol) - F.col(getPredictionCol))
                .select(
                    F.sqrt(F.sum(F.col(getWeightCol) * F.pow(F.col("residual"), 2)) / F.sum(getWeightCol))
                )
                .collect()(0)(0).asInstanceOf[Double]
    
        }
    
        override def copy(extra: ParamMap): Evaluator = defaultCopy(extra)
    
        override def isLargerBetter: Boolean = false
    }
    
    

    以下是它的使用方法。

    val wrmseEvaluator = new WRmseEvaluator()
        .setLabelCol(labelColName)
        .setPredictionCol(predColName)
        .setWeightCol(weightColName)
    
    

    【讨论】:

      猜你喜欢
      • 2020-01-18
      • 1970-01-01
      • 2012-03-21
      • 2020-05-13
      • 2018-08-20
      • 1970-01-01
      • 2021-07-15
      • 2023-03-03
      • 2021-06-25
      相关资源
      最近更新 更多