【问题标题】:HIVE - INSERT OVERWRITE using WITH CLAUSEHIVE - 使用 WITH 子句插入覆盖
【发布时间】:2016-07-07 12:54:21
【问题描述】:

我有一个生成的查询以 WITH 子句开头,当我在控制台中运行它时它工作正常,当我尝试使用 INSERT OVERWRITE 运行查询以将输出加载到单独的配置单元表中时

INSERT OVERWRITE TABLE $proc_db.$master_table PARTITION(created_dt, country) $master_query

它会抛出以下错误

cannot recognize input near 'WITH' 't' 'as' in statement

查询如下:

master_query="
WITH t
AS (
SELECT subscription_id
    ,country
    ,email_type
    ,email_priority
    ,created_dt
FROM crm_arrow.birthday
WHERE created_dt = '2016-07-07'
    AND (COUNTRY = 'SG')
GROUP BY subscription_id
    ,country
    ,email_type
    ,email_priority
    ,created_dt

UNION ALL

SELECT subscription_id
    ,country
    ,email_type
    ,email_priority
    ,created_dt
FROM crm_arrow.wishlist
WHERE created_dt = '2016-07-07'
    AND (COUNTRY = 'SG')
GROUP BY subscription_id
    ,country
    ,email_type
    ,email_priority
    ,created_dt

UNION ALL
.....
)
SELECT q.subscription_id
,q.country
,q.email_type
FROM (
SELECT t1.subscription_id
    ,t1.country
    ,DENSE_RANK() OVER (
        PARTITION BY t1.subscription_id
        ,t1.country ORDER BY t1.email_priority
        ) global_rank
    ,CASE 
        WHEN t1.email_type = t2.email_type
            THEN t1.email_type
        END email_type
FROM t t1
LEFT JOIN t t2 ON t1.country = t2.country
    AND t1.subscription_id = t2.subscription_id
) q
WHERE q.email_type IS NOT NULL
AND (
    q.global_rank <= 2
    AND country = 'SG'
    )
"

如何使用巨大的内部查询进行有效的自联接?我还尝试在 master_query 中包含 select 语句,但它仍然无法正常工作。

【问题讨论】:

    标签: hadoop hive


    【解决方案1】:

    这只是您将 INSERT 语句放置在问题所在的位置。有关如何将 INSERT 与 WITH 子句结合使用的示例,请参见此处

    CREATE TABLE ramesh_test
    (key          BIGINT,
     text_value   STRING,
     roman_value  STRING)
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY '\t' 
    LINES TERMINATED BY '\n' 
    STORED AS TEXTFILE;
    
    WITH v_text
    AS
    (SELECT 1 AS key, 'One' AS value),
    v_roman
    AS
    (SELECT 1 AS key, 'I' AS value)
    INSERT OVERWRITE TABLE ramesh_test
    SELECT v_text.key, v_text.value, v_roman.value
      FROM v_text JOIN v_roman
                    ON (v_text.key = v_roman.key);
    

    将 INSERT 置于主 SELECT 上方。

    希望这会有所帮助!

    【讨论】:

      【解决方案2】:

      您需要将查询更改为这样的内容,以便 INSERT OVERWRITE 位于查询中的 SELECT q.subscription_id 子句之前:-

      请查看此示例。使用 1 或多个 with 在顶部,然后写 INSERT OVERWRITE 紧跟选择查询:-

      WITH TABLE1 
      AS
      (
          SELECT 
          cod_index,
          CAST(test_1 AS VARCHAR(200)), 
          CAST(test_2 AS VARCHAR(200)), 
          CAST(test_3 AS VARCHAR(200))
          FROM db_h_gss.tb_h_test_orig
      )
      INSERT INTO TABLE db_h_gss.tb_h_test_insert PARTITION (cod_index = 1)
      SELECT
          test_1,
          test_2,
          test_3
      FROM TABLE1 WHERE cod_index = 1;
      

      【讨论】:

        【解决方案3】:

        假设您的大型查询确实有效,您只需要删除 WHERE T AS - 它不是有效的 Hive 语法,这是错误告诉您的。

        所以你的查询应该是这样的

        INSERT OVERWRITE TABLE $proc_db.$master_table PARTITION(created_dt, country)
        SELECT subscription_id ...
        

        【讨论】:

          猜你喜欢
          • 2014-11-27
          • 1970-01-01
          • 1970-01-01
          • 2020-01-12
          • 1970-01-01
          • 2011-08-18
          • 1970-01-01
          • 2021-10-16
          • 2017-01-29
          相关资源
          最近更新 更多