【发布时间】:2020-12-01 17:02:22
【问题描述】:
我有一个这样的数据框,date 列的格式是yyyy-mm-dd:
+--------+----------+---------+----------+-----------+--------------------+
|order_id|product_id|seller_id| date|pieces_sold| bill_raw_text|
+--------+----------+---------+----------+-----------+--------------------+
| 668| 886059| 3205|2015-01-14| 91|pbdbzvpqzqvtzxone...|
| 6608| 541277| 1917|2012-09-02| 44|cjucgejlqnmfpfcmg...|
| 12962| 613131| 2407|2016-08-26| 90|cgqhggsjmrgkrfevc...|
| 14223| 774215| 1196|2010-03-04| 46|btujmkfntccaewurg...|
| 15131| 769255| 1546|2018-11-28| 13|mrfsamfuhpgyfjgki...|
+--------+----------+---------+----------+-----------+--------------------+
我想创建并附加一个包含月份周数的列。意味着一个月四个星期,我想计算我所有日期的时间。
这就是我所做的:
sales_table.select(
'*',
F.date_format("date", "W").alias('week_month')
).show(5)
错误是:
An error occurred while calling o140.showString.
: org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:176)
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:165)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.validatePatternString(TimestampFormatter.scala:110)
at org.apache.spark.sql.catalyst.util.TimestampFormatter$.getFormatter(TimestampFormatter.scala:279)
at org.apache.spark.sql.catalyst.util.TimestampFormatter$.apply(TimestampFormatter.scala:313)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.$anonfun$formatter$1(datetimeExpressions.scala:646)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter$lzycompute(datetimeExpressions.scala:641)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter(datetimeExpressions.scala:639)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.doGenCode(datetimeExpressions.scala:665)
at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
at scala.Option.getOrElse(Option.scala:189)
..
..
..
如何从日期获取月份的周数?
【问题讨论】:
标签: apache-spark pyspark apache-spark-sql