【问题标题】:Need to convert a query from Redshift to Presto需要将查询从 Redshift 转换为 Presto
【发布时间】:2018-07-22 05:29:58
【问题描述】:

给出以下用 AWS Redhift 编写的查询:

SELECT session_date,'min' as stats,mini as value,product,endpoint
from 
(select 
distinct trunc(joinstart_ev_timestamp) as session_date, 
PERCENTILE_DISC(0.02) WITHIN GROUP (ORDER BY join_time) over(partition by 
trunc(joinstart_ev_timestamp))/1000 as mini,
PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY join_time) OVER (partition by         
trunc(joinstart_ev_timestamp))/1000 as first_quartile,
median(join_time) over(partition by trunc(joinstart_ev_timestamp))/1000 as 
jt,
PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY join_time) OVER (partition by 
trunc(joinstart_ev_timestamp))/1000 as third_quartile,
PERCENTILE_DISC(0.98) WITHIN GROUP (ORDER BY join_time) over(partition by 
trunc(joinstart_ev_timestamp))/1000 as maxi,
product_name as product,
endpoint as endpoint
from qe_datawarehouse.join_session_fact
where  
trunc(joinstart_ev_timestamp)  between '2018-01-18' and '2018-01-30'
and lower(product_name) LIKE 'gotowebinar%' 
and join_time>0 and join_time <= 600000 and join_time is not null 
and audio_connect_time >= 0  
and (entrypoint_access_time >= 0 or entrypoint_access_time is null)
and (panel_connect_time >= 0  or panel_connect_time is null) and version = 
'V2');

我需要将其转换为相应的 Presto 查询。

我在下面尝试过:

  • 当我在 Presto 中的 Hive 查询上运行时,我收到错误: " 查询失败 (#20180212_044343_00014_jb834): line 5:36: missing 'BY' at '('"
  • 我知道我必须以某种方式在 presto 中使用“approx_percentile()”,但实际上无法使用它。

注意: 在 Redshift 查询中,每一列都被视为字符串,但在 Presto 中,数据类型如下所示:

create external table if not exists join_session_fact (
 join_session_fact_id string
,session_tracking_id string
,user_id string
,participant_id string
,meeting_id string
,session_mcs_id string
,browser_name string
,browser_version string
,endpoint string
,entrypoint string
,build_number string
,model_id string
,model_name string
,hardware_net string
,ip_address string
,country string
,region string
,city string
,os_type string
,os_architecture string
,os_locale string
,os_timezone string
,product_name string
,product_version string
,product_tier string
,participant_role string
,timezone string
,joinstart_ev_timestamp timestamp
,joinLaunch_ev_timestamp timestamp
,joinSession_ev_timestamp timestamp
,joinTime_ev_timestamp timestamp
,audioConnect_ev_timestamp timestamp
,connection_type string
,download_start_timestamp timestamp
,download_end_timestamp timestamp
,install_start_timestamp timestamp
,install_end_timestamp timestamp
,password_start_timestamp timestamp
,password_end_timestamp timestamp
,login_start_timestamp timestamp
,login_end_timestamp timestamp
,audioWait_start_timestamp timestamp
,audioWait_end_timestamp timestamp
,hallway_start_timestamp timestamp
,hallway_end_timestamp timestamp
,entrypoint_access_time double
,endpoint_access_time double
,panel_connect_time double
,audio_connect_time double
,install_time_endpoint double
,download_time_endpoint double
,install_time_launcher double
,download_time_launcher double
,join_time double
,process_data_timestamp timestamp
,source_date timestamp
,version string
,event_date timestamp
)
PARTITIONED BY (data_input_date string) 
stored as orc
location '${hiveconf:s3bucket}/${hiveconf:fact_path}/${hiveconf:join_session_fact}/'
TBLPROPERTIES (“orc.compress”=“snappy”);

请注意,当我在 Presto 中运行以下查询时,它工作正常:

select 
distinct cast(joinstart_ev_timestamp as date) as session_date, 
approx_percentile(cast(join_time as double),0.50) over (partition by 
cast(joinstart_ev_timestamp as date)) /1000 as jt,
product_name as product,
endpoint as endpoint
from datawarehouse.join_session_fact
where  
cast(joinstart_ev_timestamp as date)  between date '2018-01-18' and date '2018-01-30'
and lower(product_name) LIKE 'gotowebinar%' 
and join_time > 0 and join_time <= 600000 and join_time is not null 
and audio_connect_time >= 0 
and (entrypoint_access_time >= 0 or entrypoint_access_time is null)
and (panel_connect_time >= 0  or panel_connect_time is null) and version = 'V2'

【问题讨论】:

    标签: sql amazon-s3 amazon-redshift presto


    【解决方案1】:

    可能是 WITHIN GROUP。 AFAIK,不支持那些百分位函数。该错误可能是由于语法无法识别 WITHIN GROUP() 子句。

    【讨论】:

    • 嗨,马特,您的回复。我已经用运行正常的查询更新了上述问题。现在,您能帮我找到以下 Redshift 查询部分的相应 Presto 语法:“ PERCENTILE_DISC(0.02) WITHIN GROUP (ORDER BY cast(join_time as double)) over(partition by cast(joinstart_ev_timestamp as date))/ 1000 个迷你“
    【解决方案2】:

    我找到了到 Presto 的正确转换:

    SELECT session_date,'min' as stats,mini as value,product,endpoint
    from 
    (select 
    distinct cast(joinstart_ev_timestamp as date) as session_date, 
    approx_percentile(cast(join_time as double),0.02) over (partition by 
    cast(joinstart_ev_timestamp as date))/1000 as mini,
    approx_percentile(cast(join_time as double),0.25) over (partition by 
    cast(joinstart_ev_timestamp as date))/1000 as first_quartile,
    approx_percentile(cast(join_time as double),0.50) over (partition by 
    cast(joinstart_ev_timestamp as date))/1000 as jt,
    approx_percentile(cast(join_time as double),0.75) over (partition by 
    cast(joinstart_ev_timestamp as date))/1000 as third_quartile,
    approx_percentile(cast(join_time as double),0.98) over (partition by 
    cast(joinstart_ev_timestamp as date))/1000 as maxi,
    product_name as product,
    endpoint as endpoint
    from datawarehouse.join_session_fact
    where  
    cast(joinstart_ev_timestamp as date)  between date_add('day', -16, now()) 
    and  date_add('day', -1, now())
     and lower(product_name) LIKE 'gotowebinar%' 
     and join_time>0 and join_time <= 600000 and join_time is not null 
     and audio_connect_time >= 0  
    and (entrypoint_access_time >= 0 or entrypoint_access_time is null)
    and (panel_connect_time >= 0  or panel_connect_time is null) and version = 'V2')
    

    【讨论】:

      猜你喜欢
      • 2021-08-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-03
      • 2021-12-17
      • 2021-07-27
      相关资源
      最近更新 更多