【问题标题】:Empty Dataset Returned while performing Join执行连接时返回空数据集
【发布时间】:2018-03-24 07:36:58
【问题描述】:

我正在从 csv 文件中读取 2 个数据帧。 但是,当我加入 2 个数据帧时,由于它们之间的连接,我得到了一个空数据集。

这是 2 个数据框。

val dfAverage = amount.join(client,"clientCode")
  .groupBy(client("clientName")).agg(avg(amount("opAmount"))
  .as("average"))
  .select("clientName","average")

这是Join的代码sn-p。 结果我得到一个空的数据帧,但架构是正确的。

由于我是 Scala 和 Spark 的新手,我需要帮助解决这个简单的问题。

提前致谢。

【问题讨论】:

  • 请不要使用图片使用有问题的文字。

标签: scala apache-spark join dataframe


【解决方案1】:
import org.apache.spark.sql.functions._


val client = sc.parallelize(Seq(
  ("Abhishek", "C1"), 
  ("XUELAN", "C2"),
  ("Xahir", "C3")

)).toDF("ClientName", "ClientCode")

client.show()


+----------+----------+
|ClientName|ClientCode|
+----------+----------+
|  Abhishek|        C1|
|    XUELAN|        C2|
|     Xahir|        C3|
+----------+----------+



val amount = sc.parallelize(Seq(
  ("C1", "C11",3122l), 
  ("C1", "C12",4312l), 
  ("C2", "C21",21431l), 
  ("C2", "C31",87588l), 
  ("C3", "C32",98769l), 
  ("C3", "C33",86567l), 
  ("C3", "C34",23112l)


)).toDF("ClientCode", "OperationCode" ,"opAmount")

amount.show()

+----------+-------------+--------+
|ClientCode|OperationCode|opAmount|
+----------+-------------+--------+
|        C1|          C11|    3122|
|        C1|          C12|    4312|
|        C2|          C21|   21431|
|        C2|          C31|   87588|
|        C3|          C32|   98769|
|        C3|          C33|   86567|
|        C3|          C34|   23112|
+----------+-------------+--------+

val dfAverage = amount.join(client,"clientCode") .groupBy(client("clientName"))
 .agg(avg(amount("opAmount")).as("average"))
  .select("clientName","average")

dfAverage.show()


+----------+-----------------+
|clientName|          average|
+----------+-----------------+
|  Abhishek|           3717.0|
|     Xahir|69482.66666666667|
|    XUELAN|          54509.5|
+----------+-----------------+

  import sqlContext.implicits._
    import org.apache.spark.sql._
    import org.apache.spark.sql.functions._

    client.createOrReplaceTempView("client")
    amount.createOrReplaceTempView("amount")


   val result = spark.sqlContext.sql("SELECT 
   client.ClientName,avg(amount.opAmount)as average FROM amount JOIN client on 
    amount.ClientCode=client.ClientCode GROUP BY client.ClientName")


+----------+-----------------+
|ClientName|          average|
+----------+-----------------+
|  Abhishek|           3717.0|
|     Xahir|69482.66666666667|
|    XUELAN|          54509.5|
+----------+-----------------+

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-06-23
    • 2018-02-14
    • 1970-01-01
    • 2010-10-16
    • 1970-01-01
    • 2016-04-19
    • 2011-04-24
    • 1970-01-01
    相关资源
    最近更新 更多