【问题标题】:SQL-query error in pyspark while using temp-table使用临时表时 pyspark 中的 SQL 查询错误
【发布时间】:2019-02-22 16:32:07
【问题描述】:

我有一个 SQL 查询,我必须在 PySpark(DataBricks) 中访问它。由于复杂的查询,PySpark 无法读取相同的内容。有人可以检查我的查询并帮助我将此查询写入单个“SELECT”语句而不使用“WITH”语句。

Stage:- 1
promotions="""
(WITH VCTE_Promotions as (SELECT v.Shortname, v.Employee_ID_ALT, v.Job_Level, 
                         v.Management_Level, CAST(sysdatetime() AS date) AS PIT_Date, v.Employee_Status_Alt as Employee_Status, 
                         v.Work_Location_Region, v.Work_Location_Country_Desc, v.HML, 
                         [DM_GlobalStaff].[dbo].[V_Worker_PIT].Is_Manager
FROM           [DM_GlobalStaff].[dbo].[V_Worker_CUR] as v 
LEFT OUTER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_PIT] ON v.Management_Level = [DM_GlobalStaff].[dbo].[V_Worker_PIT].Management_Level),

VCTE_Promotion_v2_Eval as (
SELECT        Employee_ID_ALT,
                             ( SELECT max([pit_date]) AS prior_data 
                               FROM [DM_GlobalStaff].[dbo].[V_Worker_PIT] AS t
                               WHERE (employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Is_Manager <> a.Is_Manager) OR
                                      (employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Job_Level <> a.Job_Level)) AS prev_job_change_date, Is_Manager
FROM            VCTE_Promotions AS a)

SELECT  VCTE_Promotion_v2_Eval.Employee_ID_ALT, COALESCE (v_cur.Employee_Status_ALT, N'') AS Curr_Emp_Status, 
                         COALESCE (v_cur.Employee_Type, N'') AS Curr_Employee_Type, v_cur.Hire_Date_Alt AS Curr_Hire_Date, 
                         v_cur.Termination_Date_ALT  AS Curr_Termination_Date, COALESCE (v_cur.Termination_Action_ALT, N'') 
                         AS Curr_Termination_Action, cast (v_cur.Job_Level as int) AS Curr_Job_Level, 
                         COALESCE (v_cur.Management_Level, N'') AS Curr_Management_Level, 
                         COALESCE (VCTE_Promotion_v2_Eval.Is_Manager, N'') AS Curr_Ismanager, 
                         CASE WHEN v_m.Job_Level < v_cur.Job_Level OR
                         (VCTE_Promotion_v2_Eval.Is_Manager = 1 AND v_m.Is_Manager = 0 AND v_m.Job_Level <= v_cur.Job_Level) 
                         THEN 'Promotion' WHEN v_m.Job_Level <> v_cur.Job_Level OR
                         VCTE_Promotion_v2_Eval.Is_Manager <> v_m.Is_Manager THEN 'Other' ELSE '' END AS Promotion, v_cur.Tenure, 
                         v_cur.Review_Rating_Current
FROM            VCTE_Promotion_v2_Eval INNER JOIN
                         [DM_GlobalStaff].[dbo].[V_Worker_CUR] as v_cur ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT LEFT OUTER JOIN
                         [DM_GlobalStaff].[dbo].[V_Worker_PIT] as v_m ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date AND 
                         VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt
) as pr """

stage-2
promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)

stage-3
promotions.count()
promotions.show()

从第 2 阶段查询中得到以下错误:-

com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword &apos;WITH&apos;.

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-2532359884208251> in <module>()
----> 1 promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)

/databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
    533             jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
    534             return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 535         return self._df(self._jreader.jdbc(url, table, jprop))
    536 
    537 

我的查询没有问题,这在我的 SQL 提示符下工作得非常好。但是,一旦我在 PYSPARK(DataBricks) 中使用相同的查询,就会出现语法错误。你能帮我了解一下 PySpark 的语法吗?

我们将非常感谢您的及时协助。

【问题讨论】:

    标签: sql sql-server pyspark apache-spark-sql


    【解决方案1】:

    我没有办法测试,但是请尝试一下,然后比较结果,看看是否一切都匹配。

    另外,我使用的是交叉应用程序而不是相关子查询,因为没有简单的连接并且相关子查询效率不高, 所以交叉申请应该可以完成这项工作

    (
        SELECT
            VCTE_Promotion_v2_Eval.Employee_ID_ALT
           ,COALESCE(v_cur.Employee_Type, N'') AS Curr_Employee_Type
           ,v_cur.Review_Rating_Current
        (
        SELECT
        Employee_ID_ALT,
        pr.prev_job_change_date,
        IsManager        
        From 
            ( SELECT
                v.Shortname
               ,v.Employee_ID_ALT
               ,v.Job_Level
               ,v.Management_Level
               ,CAST(SYSDATETIME() AS DATE) AS PIT_Date
               ,v.Employee_Status_Alt AS Employee_Status
               ,v.Work_Location_Region
               ,v.Work_Location_Country_Desc
               ,v.HML
               ,dbo.T_Mngmt_Level_IsManager_Mapping.IsManager
            FROM Worker_CUR AS v
            LEFT OUTER JOIN dbo.T_Mngmt_Level_IsManager_Mapping
            ON v.Management_Level = dbo.T_Mngmt_Level_IsManager_Mapping.Management_Level
            ) as VCTE_Promotions a
        Cross APPLY ( 
                     SELECT
                        MAX(PIT_Date) AS prior_data
                     FROM dbo.V_Worker_PIT_with_IsManager AS t
                     WHERE (employee_id_alt = a.Employee_ID_ALT)
                     AND (PIT_Date < a.PIT_Date)
                     AND (IsManager <> a.IsManager)
                     OR (employee_id_alt = a.Employee_ID_ALT)
                     AND (PIT_Date < a.PIT_Date)
                     AND (Job_Level <> a.Job_Level)
                     )
                    AS pr
                ) as VCTE_Promotion_v2_Eval
        INNER JOIN [DM_GlobalStaff].[dbo].[V_Worker_CUR] AS v_cur
                ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT
            LEFT OUTER JOIN dbo.V_Worker_PIT_with_IsManager AS v_m
                ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date
                    AND VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt ) as promotions
    

    【讨论】:

    • 感谢马尔科夫的回复。由于此查询需要在 PYSPARK(DataBricks) 中执行。我仍然遇到同样的错误。 """com.microsoft.sqlserver.jdbc.SQLServerException: 在简单的 SQL 提示符下,关键字‘SELECT’附近的语法不正确。即使我之前的查询工作正常。
    • 需要有人立即关注此问题。
    • 可以用sqlContext.sql("你的sql查询").collect()吗?
    • 这段代码没有帮助运行,我从 PySpark 得到类似的错误
    • @bugfoot 我从我的命令中删除了结束的分号,现在它运行顺利。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-01-03
    • 2014-05-03
    • 1970-01-01
    • 1970-01-01
    • 2014-03-23
    • 2013-02-05
    相关资源
    最近更新 更多