【问题标题】:Get all Not null columns of spark dataframe in one Column在一列中获取火花数据帧的所有非空列
【发布时间】:2020-06-16 11:26:22
【问题描述】:

我需要从 Hive 表中选择所有非空列并将它们插入 Hbase。例如,考虑下表:

Name      Place         Department  Experience
==============================================
Ram      | Ramgarh      |  Sales      |  14
Lakshman | Lakshmanpur  |Operations   | 
Sita     | Sitapur      |             |  14
Ravan    |              |             |  25

我必须将上表中的所有非空列写入 Hbase。所以我写了一个逻辑来在数据框的一列中获取非空列,如下所示。那里的名称列是强制性的。

Name        Place       Department  Experience      Not_null_columns
================================================================================
Ram         Ramgarh     Sales        14            Name, Place, Department, Experience
Lakshman    Lakshmanpur Operations                 Name, Place, Department
Sita        Sitapur                  14            Name, Place, Experience
Ravan                                25            Name, Experience

现在我的要求是在数据框中创建一个列,其中所有非空列的值都在一个列中,如下所示。

Name      Place        Department   Experience    Not_null_columns_values
Ram       Ramgarh      Sales        14           Name: Ram, Place: Ramgarh, Department: Sales, Experince: 14
Lakshman  Lakshmanpur  Operations                Name:    Lakshman, Place: Lakshmanpur, Department: Operations
Sita      Sitapur                   14           Name:    Sita, Place: Sitapur, Experience: 14
Ravan                               25           Name:    Ravan, Experience: 25

一旦超过 df,我会将其写入 Hbase,名称为键,最后一列为值。

如果有更好的方法可以做到这一点,请告诉我。

【问题讨论】:

    标签: apache-spark apache-spark-sql


    【解决方案1】:

    试试这个-

    加载提供的测试数据

        val data =
          """
            |Name    |  Place    |     Department | Experience
            |
            |Ram      | Ramgarh      |  Sales      |  14
            |
            |Lakshman | Lakshmanpur  |Operations   |
            |
            |Sita     | Sitapur      |             |  14
            |
            |Ravan   |              |              |  25
          """.stripMargin
    
        val stringDS = data.split(System.lineSeparator())
          .map(_.split("\\|").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString(","))
          .toSeq.toDS()
        val df = spark.read
          .option("sep", ",")
          .option("inferSchema", "true")
          .option("header", "true")
    //      .option("nullValue", "null")
          .csv(stringDS)
    
        df.show(false)
        df.printSchema()
        /**
          * +--------+-----------+----------+----------+
          * |Name    |Place      |Department|Experience|
          * +--------+-----------+----------+----------+
          * |Ram     |Ramgarh    |Sales     |14        |
          * |Lakshman|Lakshmanpur|Operations|null      |
          * |Sita    |Sitapur    |null      |14        |
          * |Ravan   |null       |null      |25        |
          * +--------+-----------+----------+----------+
          *
          * root
          * |-- Name: string (nullable = true)
          * |-- Place: string (nullable = true)
          * |-- Department: string (nullable = true)
          * |-- Experience: integer (nullable = true)
          */
    

    先转换结构再转换成json

        val x = df.withColumn("Not_null_columns_values",
          to_json(struct(df.columns.map(col): _*)))
        x.show(false)
        x.printSchema()
    
        /**
          * +--------+-----------+----------+----------+---------------------------------------------------------------------+
          * |Name    |Place      |Department|Experience|Not_null_columns_values                                              |
          * +--------+-----------+----------+----------+---------------------------------------------------------------------+
          * |Ram     |Ramgarh    |Sales     |14        |{"Name":"Ram","Place":"Ramgarh","Department":"Sales","Experience":14}|
          * |Lakshman|Lakshmanpur|Operations|null      |{"Name":"Lakshman","Place":"Lakshmanpur","Department":"Operations"}  |
          * |Sita    |Sitapur    |null      |14        |{"Name":"Sita","Place":"Sitapur","Experience":14}                    |
          * |Ravan   |null       |null      |25        |{"Name":"Ravan","Experience":25}                                     |
          * +--------+-----------+----------+----------+---------------------------------------------------------------------+
          */
    

    【讨论】:

    • 谢谢,这就像一个魅力。 . .仍在试图理解其中的逻辑。 . .作为有火花的新手。
    猜你喜欢
    • 2020-06-11
    • 1970-01-01
    • 2018-02-15
    • 2017-11-16
    • 1970-01-01
    • 2019-09-17
    • 1970-01-01
    • 2018-11-27
    • 1970-01-01
    相关资源
    最近更新 更多