【问题标题】:java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDTjava.lang.IllegalArgumentException:要求失败:列特征必须是 org.apache.spark.ml.linalg.VectorUDT 类型
【发布时间】:2017-06-15 00:25:30
【问题描述】:

我对 Spark 机器学习非常陌生(2 天前)我正在 Spark Shell 中执行以下代码我试图预测一些值我在 Stackoverflow 中看到此错误帖子但我无法修复我的代码正确的解决方案,因此再次发布问题并为此道歉

输入数据:

1.00,1.00,9.00
1.00,2.00,10.00
1.00,3.00,9.00
1.00,4.00,9.00
1.00,5.00,9.00
1.00,6.00,9.45
1.00,7.00,9.45
1.00,8.00,9.45
1.00,9.00,9.45

代码:

val df = spark.read.csv("/root/Predictiondata.csv").toDF("Userid", "Date", "Intime")
import org.apache.spark.sql.types.DoubleType
val featureDf = df.select( df("Userid").cast(DoubleType).as("Userid"),df("Date").cast(DoubleType).as("Date"),df("Intime").cast(DoubleType).as("Intime")).toDF()
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
val data = featureDf.select("Userid","Date","Intime").map(r => LabeledPoint(r(0).toString.toDouble,Vectors.dense(r(1).toString.toDouble,r(2).toString.toDouble))).toDF()
import org.apache.spark.ml.regression.LinearRegression
val lr = new LinearRegression()
val lrModel = lr.fit(data)

错误:

 scala> val lrModel = lr.fit(data)
 java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.
 at scala.Predef$.require(Predef.scala:224)
 at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
 at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51)
 at org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:72)
 at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:122)
 at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
 at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
 ... 48 elided

非常感谢任何帮助或建议。

提前致谢

【问题讨论】:

    标签: scala apache-spark apache-spark-mllib apache-spark-ml


    【解决方案1】:

    如果您的Spark > 2.x 导入

    org.apache.spark.ml.linalg.VectorUDT
    

    而不是

    org.apache.spark.mllib.linalg.VectorUDT
    

    【讨论】:

      【解决方案2】:

      请使用带有 DataFrame API 的 Spark 2+ 和 VectorAssembler

      类似这样的东西(没有测试过):

      import spark.implicits._
      
      val data = spark.read
          .option("inferSchema", true)
          .csv("/root/Predictiondata.csv")
          .toDF("Userid", "Date", "Intime")
      
      val dataWithFeatures = new VectorAssembler()
          .setInputCols(Array("Date", "Intime"))
          .transform(data)
      
      val dataWithLabelFeatures = dataWithFeatures        
          .withColumn("label", $"Userid")
      
      val lrModel = new LinearRegression().fit(dataWithLabelFeatures)
      

      另外,看看Pipeline

      【讨论】:

      • 非常感谢您的帮助!!...在进行了一些修改后它工作了...再次感谢您的帮助!!!...
      猜你喜欢
      • 1970-01-01
      • 2018-09-18
      • 2020-05-10
      • 2021-08-28
      • 1970-01-01
      • 2016-06-23
      • 2018-03-31
      • 1970-01-01
      • 2018-07-08
      相关资源
      最近更新 更多