【问题标题】:Spark - Find the range of all year weeks between 2 weeksSpark - 查找 2 周之间的全年周的范围
【发布时间】:2020-01-20 18:14:20
【问题描述】:

我需要找出给定周之间的全年周数。

201824 是一年周的示例。这意味着 2018 年的第 24 周。

假设一年有 52 周,则 2018 年的周数以 201801 开始,以 201852 结束。之后,以 201901 继续。

如果开始周和结束周在同一年,我能够找到 2 周之间的全年周范围,如下所示

val range = udf((i: Int, j: Int) => (i to j).toArray)

以上代码仅在开始周和结束周在同一年时有效,例如 201912 - 201917

如果开始周和结束周属于不同年份,我如何使其工作。

Example: 201849 - 201903

The above weeks should give the output as: 
201849,201850,201851,201852,201901,201902,201903

【问题讨论】:

    标签: scala date dataframe apache-spark hadoop


    【解决方案1】:

    嗯,还有很多优化要做,但对于大体方向,您可以使用:
    我在这里使用org.joda.time.format,但java.time 也应该适合。

     def rangeOfYearWeeks(weeksRange: String): Array[String] = {
      try {
        val left =  weeksRange.split("-")(0).trim
        val right = weeksRange.split("-")(1).trim
    
        val leftPattern  = s"${left.substring(0, 4)}-${left.substring(4)}"
        val rightPattern = s"${right.substring(0, 4)}-${right.substring(4)}"
    
        val fmt = DateTimeFormat.forPattern("yyyy-w")
    
        val leftDate  = fmt.parseDateTime(leftPattern)
        val rightDate = fmt.parseDateTime(rightPattern)
        //if (leftDate.isAfter(rightDate))
        val weeksBetween = Weeks.weeksBetween(leftDate, rightDate).getWeeks
        val dates = for (one <- 0 to weeksBetween) yield {
          leftDate.plusWeeks(one)
        }
    
        val result: Array[String] = dates.map(date => fmt.print(date)).map(_.replaceAll("-", "")).toArray
        result
      } catch {
        case e: Exception => Array.empty
      }
    }
    

    例子:

    val dates = Seq("201849 - 201903", "201912 - 201917").toDF("col")
    
    val weeks = udf((d: String) => rangeOfYearWeeks(d))
    
    dates.select(weeks($"col")).show(false)
    
    +-----------------------------------------------------+
    |UDF(col)                                             |
    +-----------------------------------------------------+
    |[201849, 201850, 201851, 201852, 20181, 20192, 20193]|
    |[201912, 201913, 201914, 201915, 201916, 201917]     |
    +-----------------------------------------------------+
    

    【讨论】:

      【解决方案2】:

      这是一个使用 java.time API 的 UDF 解决方案:

      def weeksBetween = udf{ (startWk: Int, endWk: Int) =>
        import java.time.LocalDate
        import java.time.format.DateTimeFormatter
        import scala.util.{Try, Success, Failure}
      
        def formatYW(yw: Int): String = {
          val pattern = "(\\d{4})(\\d+)".r
          s"$yw" match { case pattern(y, w) => s"$y-$w-1"}
        }
      
        val formatter = DateTimeFormatter.ofPattern("YYYY-w-e")  // week-based year
      
        Try(
          Iterator.iterate(LocalDate.parse(formatYW(startWk), formatter))(_.plusWeeks(1)).
            takeWhile(_.isBefore(LocalDate.parse(formatYW(endWk), formatter))).
            map{ s =>
              val a = s.format(formatter).split("-")
              (a(0) + f"${a(1).toInt}%02d").toInt
            }.
            toList.tail
        ) match {
          case Success(ls) => ls
          case Failure(_) => List.empty[Int]  // return an empty list
        }
      }
      

      测试 UDF:

      val df = Seq(
        (1, 201849, 201903), (2, 201908, 201916), (3, 201950, 201955)
      ).toDF("id", "start_wk", "end_wk")
      
      df.withColumn("weeks_between", weeksBetween($"start_wk", $"end_wk")).show(false)
      // +---+--------+------+--------------------------------------------------------+
      // |id |start_wk|end_wk|weeks_between                                           |
      // +---+--------+------+--------------------------------------------------------+
      // |1  |201849  |201903|[201850, 201851, 201852, 201901, 201902]                |
      // |2  |201908  |201916|[201909, 201910, 201911, 201912, 201913, 201914, 201915]|
      // |3  |201950  |201955|[]                                                      |
      // +---+--------+------+--------------------------------------------------------+
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-01-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-12-31
        • 2021-08-21
        • 1970-01-01
        相关资源
        最近更新 更多