【问题标题】:Week number of month from date从日期开始的月份的周数
【发布时间】:2020-12-01 17:02:22
【问题描述】:

我有一个这样的数据框,date 列的格式是yyyy-mm-dd

+--------+----------+---------+----------+-----------+--------------------+
|order_id|product_id|seller_id|      date|pieces_sold|       bill_raw_text|
+--------+----------+---------+----------+-----------+--------------------+
|     668|    886059|     3205|2015-01-14|         91|pbdbzvpqzqvtzxone...|
|    6608|    541277|     1917|2012-09-02|         44|cjucgejlqnmfpfcmg...|
|   12962|    613131|     2407|2016-08-26|         90|cgqhggsjmrgkrfevc...|
|   14223|    774215|     1196|2010-03-04|         46|btujmkfntccaewurg...|
|   15131|    769255|     1546|2018-11-28|         13|mrfsamfuhpgyfjgki...|
+--------+----------+---------+----------+-----------+--------------------+

我想创建并附加一个包含月份周数的列。意味着一个月四个星期,我想计算我所有日期的时间。

这就是我所做的:

sales_table.select(
    '*',
    F.date_format("date", "W").alias('week_month')
).show(5)

错误是:

An error occurred while calling o140.showString.
: org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
    at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:176)
    at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:165)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.validatePatternString(TimestampFormatter.scala:110)
    at org.apache.spark.sql.catalyst.util.TimestampFormatter$.getFormatter(TimestampFormatter.scala:279)
    at org.apache.spark.sql.catalyst.util.TimestampFormatter$.apply(TimestampFormatter.scala:313)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.$anonfun$formatter$1(datetimeExpressions.scala:646)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter$lzycompute(datetimeExpressions.scala:641)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter(datetimeExpressions.scala:639)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.doGenCode(datetimeExpressions.scala:665)
    at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
    at scala.Option.getOrElse(Option.scala:189)
..
..
..

如何从日期获取月份的周数?

【问题讨论】:

    标签: apache-spark pyspark apache-spark-sql


    【解决方案1】:

    如错误日志所示,在您的 spark session 中设置以下属性。

    Example:

    spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
    
    sales_table.show()
    #+----------+
    #|      date|
    #+----------+
    #|2015-01-14|
    #+----------+
    sales_table.select('*',F.date_format("date", "W").alias('week_month')).show(5)
    #+----------+----------+
    #|      date|week_month|
    #+----------+----------+
    #|2015-01-14|         3|
    #+----------+----------+
    

    【讨论】:

      【解决方案2】:

      添加行

      spark.sql.legacy.timeParserPolicy LEGACY
      

      $SPARK_HOME/conf/spark-defaults.conf

      很遗憾,在最新版本的 Spark 中,日期时间格式不再支持“W”。但是您仍然可以通过上述设置恢复遗留行为。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-01-27
        • 1970-01-01
        • 2022-06-29
        • 2022-10-23
        • 1970-01-01
        相关资源
        最近更新 更多