【问题标题】:Spark Java - Adding new column based on date in Oracle datasetSpark Java - 在 Oracle 数据集中添加基于日期的新列
【发布时间】:2020-08-25 23:38:21
【问题描述】:

尝试编写Spark java程序根据日期在数据集中添加一列。我使用的是Oracle数据库。

需要使用 Spark Java 根据发布日期添加新列(年季度)。 例如...如果发布日期介于 2020 年 1 月 1 日至 3 月 31 日之间,则 yearquarter 值将为 Q12020。 您可以在下面看到当前数据集和预期数据集...

谁能告诉我添加新列的spark java代码..

用于从表 /inputdataset 读取的代码片段

Dataset<Row> inputDataset = sparksession.read().jdbc(jdbcUrl, table_name, connectionProperties);
    inputDataset.show(); 

数据集

  Current Dateset(inputDataset):-
                   +------+--------+---------------------|
                   | ID   |location| posteddate          |
                   +------+--------+---------------------+
                   |137570|chennai |2020-06-22 13:49:... |
                   |137571| kerala |2020-02-22 14:49:... |
                   |137572|chennai |2018-10-26 13:19:... |
                   |137573|chennai |2019-09-29 14:49:... |
                   +------+-------+---------------------+
           
           
           Expected DataSet:-
                   +------+--------+---------------------+--------------+
                   |   id |location| posteddate          |  yearquarter |
                   +------+--------+---------------------+--------------+
                   |137570|chennai |2020-06-22 13:49:... |        Q22020|
                   |137571| kerala |2020-02-22 14:49:... |        Q12020|
                   |137572|chennai |2018-10-26 13:19:... |        Q42018|
                   |137573|chennai |2019-09-29 14:49:... |        Q32019|
                   +------+--------+---------------------+--------------+ 

提前致谢

【问题讨论】:

    标签: apache-spark


    【解决方案1】:

    试试这个-

    使用 quarter + year

       dataset.show(false);
            dataset.printSchema();
            /**
             * +------+--------+-------------------+
             * |ID    |location|posteddate         |
             * +------+--------+-------------------+
             * |137570|chennai |2020-06-22 13:49:00|
             * |137571|kerala  |2020-02-22 14:49:00|
             * |137572|chennai |2018-10-26 13:19:00|
             * |137573|chennai |2019-09-29 14:49:00|
             * +------+--------+-------------------+
             *
             * root
             *  |-- ID: integer (nullable = true)
             *  |-- location: string (nullable = true)
             *  |-- posteddate: timestamp (nullable = true)
             */
    
            dataset.withColumn("yearquarter", expr("concat('Q', quarter(posteddate), year(posteddate))"))
                    .show(false);
            /**
             * +------+--------+-------------------+-----------+
             * |ID    |location|posteddate         |yearquarter|
             * +------+--------+-------------------+-----------+
             * |137570|chennai |2020-06-22 13:49:00|Q22020     |
             * |137571|kerala  |2020-02-22 14:49:00|Q12020     |
             * |137572|chennai |2018-10-26 13:19:00|Q42018     |
             * |137573|chennai |2019-09-29 14:49:00|Q32019     |
             * +------+--------+-------------------+-----------+
             */
    

    【讨论】:

      【解决方案2】:

      进口:

      import static org.apache.spark.sql.functions.lit
      import static org.apache.spark.sql.functions.col;
      import static org.apache.spark.sql.functions.when;
      

      你可以以此为起点

      inputDataset = inputDataset
         .withColumn( "yearquarter", // adding column
           when(                       // conditional operator
             col("posteddate")         
                 .$greater("start_range")   // condition #1
             .and(                      // and 
                 col("posteddate").$less("end_range")), // condition #2
                      lit("Q12020"))   // column value if condition evaluates to true
           .otherwise(lit("Q22020"))); // column value if condition evaluates to false 
      

      【讨论】:

        猜你喜欢
        • 2018-12-25
        • 2022-10-24
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-04-10
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多