【问题标题】:couldnot connect apache-zeppelin to highcharts error : value seriesCol is not a member of org.apache.spark.sql.DataFrame无法将 apache-zeppelin 连接到 highcharts 错误:值 seriesCol 不是 org.apache.spark.sql.DataFrame 的成员
【发布时间】:2016-09-12 15:41:16
【问题描述】:

我正在尝试将 zeppelin 与 highcharts 连接..

%spark 
import com.knockdata.zeppelin.highcharts._ 
import com.knockdata.zeppelin.highcharts.model._ 
import sqlContext.implicits._

val Tokyo = Seq(7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3,
18.3, 13.9, 9.6).map(("Tokyo", _))

val df = (Tokyo).toDF("city", "temperature")

df.show()

highcharts(df.seriesCol("city").series("y" -> col("temperature"))).plot()

给了

import com.knockdata.zeppelin.highcharts._
import com.knockdata.zeppelin.highcharts.model._
import sqlContext.implicits._
Tokyo: Seq[(String, Double)] = List((Tokyo,7.0), (Tokyo,6.9), (Tokyo,9.5), (Tokyo,14.5), (Tokyo,18.2), (Tokyo,21.5), (Tokyo,25.2), (Tokyo,26.5), (Tokyo,23.3), (Tokyo,18.3), (Tokyo,13.9), (Tokyo,9.6))
df: org.apache.spark.sql.DataFrame = [city: string, temperature: double]
+-----+-----------+
| city|temperature|
+-----+-----------+
|Tokyo|        7.0|
|Tokyo|        6.9|
|Tokyo|        9.5|
|Tokyo|       14.5|
|Tokyo|       18.2|
|Tokyo|       21.5|
|Tokyo|       25.2|
|Tokyo|       26.5|
|Tokyo|       23.3|
|Tokyo|       18.3|
|Tokyo|       13.9|
|Tokyo|        9.6|
+-----+-----------+
<console>:201: error: value seriesCol is not a member of org.apache.spark.sql.DataFrame
              highcharts(df.seriesCol("city").series("y" -> col("temperature"))).plot()

我在 spark 解释器中添加了依赖项工件为 com.knockdata:zeppelin-highcharts:0.2

已关注https://github.com/knockdata/zeppelin-highcharts/blob/master/docs/DemoLineChart.md 并使用Are there better interface to add Highcharts support to Zeppelin 尝试了银行数据,但得到了

<console>:224: error: value series is not a member of org.apache.spark.rdd.RDD[Bank]
possible cause: maybe a semicolon is missing before `value series'?
                .series("x" -> "age", "y" -> avg(col("income")))

请帮助我哪里出错了?可能是什么问题呢? 在此先感谢

【问题讨论】:

    标签: apache-spark highcharts apache-zeppelin


    【解决方案1】:

    我将 spark 解释器 com.knockdata:zeppelin-highcharts:0.2 中的依赖项工件更改为 com.knockdata:zeppelin-highcharts:0.6.0 以解决问题..但银行数据问题仍然存在..对此有什么帮助吗?

    %spark
    import com.knockdata.zeppelin.highcharts._
    import com.knockdata.zeppelin.highcharts.model._
    import sqlContext.implicits._
    
    val bankText = sc.textFile("/home/priyanka/Downloads/bank-data.csv")
    
    case class Bank(age:Integer, region:String, income : Float, married : String, children : Integer, car:String, save_act:String, current_act : String, mortgage : String, pep : String)
    
    // split each line, filter out header (starts with "age"), and map it into Bank case class  
    val bank = bankText.map(s=>s.split(",")).filter(s=>s(0)!="age").map(
        s=>Bank(s(0).toInt, 
                s(1).replaceAll("\"", ""),
                s(2).replaceAll("\"", "").toFloat,
                s(3).replaceAll("\"", ""),
                s(4).replaceAll("\"", "").toInt,
                s(5).replaceAll("\"", ""),
                s(6).replaceAll("\"", ""),
                s(7).replaceAll("\"", ""),      
                s(8).replaceAll("\"", ""),
                s(9).replaceAll("\"", "")
            )
    )
    
    // convert to DataFrame and create temporal table
    bank.toDF().registerTempTable("bank")
    
    highcharts(bank.series("x" -> "age", "y" -> avg(col("income"))).orderBy(col("age"))).plot()
    

    这给了

    import com.knockdata.zeppelin.highcharts._
    import com.knockdata.zeppelin.highcharts.model._
    import sqlContext.implicits._
    bankText: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[49] at textFile at <console>:62
    defined class Bank
    bank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[52] at map at <console>:66
    <console>:70: error: value series is not a member of org.apache.spark.rdd.RDD[Bank]
    possible cause: maybe a semicolon is missing before `value series'?
                    .series("x" -> "age", "y" -> avg(col("income")))
                     ^
    

    谢谢

    【讨论】:

    • 非常感谢您使用它。我是作者。这与版本问题有关。在 zeppelin 0.6 上使用 zeppelin-highcart:0.6.0。 (之前的文档是说zeppelin-highcharts:0.6.0-SNAPSHOT,界面变了,我已经更正了文档)
    • 非常感谢.. 感谢您提供这个 :).. 知道从文件中读取有什么问题吗?银行数据? org.apache.spark.rdd.RDDorg.apache.spark.sql.DataFrame 之间的一些冲突?
    • 银行需要是一个DataFrame。将toDF() 移动到银行定义的末尾
    【解决方案2】:

    DataFrame 可以隐式转换为具有函数 seriesCol 的 SeriesHolder。它是在 0.6.0 版本中添加的。

    df.seriesCol("city") 
    

    该错误应该与使用错误版本的spark-highcharts 有关。示例代码(doc)对应于版本 0.6.0(直接映射到 zeppelin 版本)。

    使用 docker 可能是最简单的方法。或者使用类似Dockerfile的方式

    docker run -p 8080:8080 -d knockdata/zeppelin-highcharts
    

    【讨论】:

    • 我只使用 zeppelin 0.6.0。他们说If you wanna run on your existing zeppelin, follow Use In Zeppelin.,我需要再次使用docker吗?为什么银行数据不起作用?
    猜你喜欢
    • 2017-01-05
    • 1970-01-01
    • 2018-04-20
    • 2016-09-09
    • 1970-01-01
    • 1970-01-01
    • 2018-02-14
    • 2017-10-30
    • 2017-03-04
    相关资源
    最近更新 更多