【发布时间】:2021-07-11 01:32:26
【问题描述】:
我有下面的 pyspark 代码。在代码中,我从另一个已转换为临时视图的数据框创建数据框。然后我使用 sql 查询在最终查询中创建一个新字段。我试图创建的字段的代码最初来自 postgresql,我想知道在 pyspark sql 中 case 语句和正则表达式的正确版本是什么?
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end
我只是强制转换(field2 作为字符串)吗?
还有什么是正则表达式测试的正确 pyspark sql 版本?
代码:
from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions import date_format
from pyspark.sql.functions import lit,StringType
from pyspark.sql.functions import date_trunc, udf,trim, upper, to_date, substring, length, min, when, format_number, dayofmonth, hour, dayofyear, month, year, weekofyear, date_format, unix_timestamp
from pyspark import SparkConf
from pyspark.sql.functions import coalesce
from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.sql.functions import UserDefinedFunction
import datetime
from pyspark.sql.functions import year
from pyspark.sql.functions import datediff,coalesce,lag
from pyspark.sql.functions import when, to_date
from pyspark.sql.functions import date_add
from pyspark.sql.functions import UserDefinedFunction
import traceback
import sys
import time
import math
import datetime
table_df.createOrReplaceTempView("table")
query="""select
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end as field1
from table a"""
df=spark.sql(query)
【问题讨论】:
标签: python sql apache-spark pyspark apache-spark-sql