【问题标题】:Spark SQL between timestamp on where clause?在where子句的时间戳之间触发SQL?
【发布时间】:2018-02-01 18:49:51
【问题描述】:

我正在尝试使用 DataFrame API 返回两个时间戳之间的行。

示例代码是:

val df = Seq(
    ("red", "2016-11-29 07:10:10.234"),
    ("green", "2016-11-29 07:10:10.234"),
    ("blue", "2016-11-29 07:10:10.234")).toDF("color", "date")

  df.where(unix_timestamp($"date", "yyyy-MM-dd HH:mm:ss.S").cast("timestamp").between(LocalDateTime.now(), LocalDateTime.now().minusHours(1))).show()

但它抛出 Unsupported literal type class java.time.LocalDateTime 错误。

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class java.time.LocalDateTime 2016-11-29T07:32:12.084
    at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:57)
    at org.apache.spark.sql.functions$.lit(functions.scala:101)
    at org.apache.spark.sql.Column.$greater$eq(Column.scala:438)
    at org.apache.spark.sql.Column.between(Column.scala:542)
    at com.sankar.SparkSQLTimestampDifference$.delayedEndpoint$com$sankar$SparkSQLTimestampDifference$1(SparkSQLTimestampDifference.scala:23)
    at com.sankar.SparkSQLTimestampDifference$delayedInit$body.apply(SparkSQLTimestampDifference.scala:7)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at com.sankar.SparkSQLTimestampDifference$.main(SparkSQLTimestampDifference.scala:7)
    at com.sankar.SparkSQLTimestampDifference.main(SparkSQLTimestampDifference.scala)

【问题讨论】:

    标签: apache-spark apache-spark-sql spark-dataframe


    【解决方案1】:

    当您在 where 子句中使用 Timestamp 时,您需要将 LocalDateTime 转换为 Timestamp。还要注意between 的第一个参数是lowerBound,所以在你的情况下LocalDateTime.now().minusHours(1) 应该在LocalDateTime.now() 之前。然后你可以这样做:

    import java.time.LocalDateTime
    import java.sql.Timestamp
    
    df.where(
         unix_timestamp($"date", "yyyy-MM-dd HH:mm:ss.S")
           .cast("timestamp")
           .between(
              Timestamp.valueOf(LocalDateTime.now().minusHours(1)),
              Timestamp.valueOf(LocalDateTime.now())
           ))
      .show()
    

    你会得到过滤后的 DF 像

    +-----+--------------------+
    |color|                date|
    +-----+--------------------+
    |  red|2016-11-29 10:58:...|
    +-----+--------------------+
    

    【讨论】:

    • 专业提示:使用.show(truncate=false)查看整个数据。
    • 你真的需要cast("timestamp")吗? unix_timestamp 是否还没有返回 Timestamp
    猜你喜欢
    • 1970-01-01
    • 2018-11-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-01-08
    相关资源
    最近更新 更多