【问题标题】:Create table as select percentage subquery in Impala DB在 Impala DB 中创建表作为选择百分比子查询
【发布时间】:2020-07-27 13:27:53
【问题描述】:

我是 Impala 的新手,我需要使用选择结果集创建表,此外,此 sql 是使用 JDBC 在 Java 中运行的,请参阅下面的查询:

create table if not exists my_temp_table as select 
41 as rule_id,49 as record_id,
(select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$' )/(select count(1) from dirty_table);

我需要创建表my_temp_table 并将数据插入到该表中,这是我需要运行的一条SQL。但它运行失败并给出如下错误:

[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error

经过检查,我知道 Impala 不支持SELECT 子句子查询,我们只能使用子查询 在 FROMWHERE 子句中,请参阅 Impala 文档:https://impala.apache.org/docs/build/html/topics/impala_subqueries.html

所以对于这个问题,我该如何解决这个问题。

我的想法:

  1. 更新sql让它执行,我试过WITH就像下面的sql,它可以工作但不能用于 CREATE TABLE ... AS ...
    WITH q1 AS (
      select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$'
    ),
    q2 AS (
      select count(1) val2 from dirty_table
    )
    SELECT 100 * q1.val / q2.val2  result
    FROM q1, q2
  1. 或者,MySQL或Oracle中是否有类似BEGIN ... END的语句,那么我可以单独运行这个sql。

【问题讨论】:

    标签: java sql hadoop count impala


    【解决方案1】:

    通过您的示例,我会尝试这些方法,我相信这些方法可以正常工作。 我用 Impala 检查了解决方案

    CREATE TABLE dirty_table (
     id INT,
     msg STRING
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED  BY ','
    STORED AS TEXTFILE;
    
    
    [localhost.localdomain:21000] > SELECT * FROM dirty_table;
    Query: SELECT * FROM dirty_table
    Query submitted at: 2020-07-28 17:05:24 (Coordinator: http://localhost.localdomain:25000)
    Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=5441d6a46ce61e7b:8e49432600000000
    +----+-------------+
    | id | msg         |
    +----+-------------+
    | 1  | 13321512121 |
    | 2  | 13121212121 |
    | 3  | 03121212121 |
    | 4  | 13321512121 |
    | 5  | 13121212121 |
    | 6  | 03121212121 |
    | 7  | 13121212121 |
    +----+-------------+
    Fetched 7 row(s) in 0.14s
    

    第一个例子

    CREATE TABLE IF NOT EXISTS my_temp_table AS
    SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
    FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
         (SELECT COUNT(1) AS val2 FROM dirty_table) b;
    
    [localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS
                                  > SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
                                  > FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
                                  >      (SELECT COUNT(1) AS val2 FROM dirty_table) b;
    Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
    SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
    FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
         (SELECT COUNT(1) AS val2 FROM dirty_table) b
    +-------------------+
    | summary           |
    +-------------------+
    | Inserted 0 row(s) |
    +-------------------+
    Fetched 1 row(s) in 0.21s
    
    [localhost.localdomain:21000] > invalidate metadata;
    
    [localhost.localdomain:21000] > SELECT * FROM my_temp_table;
    Query: select * from my_temp_table
    Query submitted at: 2020-07-28 17:03:44 (Coordinator: http://localhost.localdomain:25000)
    Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=47370bf793a09b:29c4dfa000000000
    +---------+-----------+--------------------+
    | rule_id | record_id | result             |
    +---------+-----------+--------------------+
    | 41      | 49        | 0.7142857142857143 |
    +---------+-----------+--------------------+
    Fetched 1 row(s) in 0.13s
    

    第二个例子

    DROP TABLE my_temp_table;
    
    CREATE TABLE IF NOT EXISTS my_temp_table AS 
    SELECT result FROM
        (WITH q1 AS (
          SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
        ),
        q2 AS (
          SELECT COUNT(1) val2 FROM dirty_table
        )
        SELECT 100 * q1.val / q2.val2 AS result
        FROM q1, q2) t;
    
    [localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS 
                                  > SELECT result FROM
                                  >     (WITH q1 AS (
                                  >       SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
                                  >     ),
                                  >     q2 AS (
                                  >       SELECT COUNT(1) val2 FROM dirty_table
                                  >     )
                                  >     SELECT 100 * q1.val / q2.val2 AS result
                                  >     FROM q1, q2) t;
    Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
    SELECT result FROM
        (WITH q1 AS (
          SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
        ),
        q2 AS (
          SELECT COUNT(1) val2 FROM dirty_table
        )
        SELECT 100 * q1.val / q2.val2 AS result
        FROM q1, q2) t
    +-------------------+
    | summary           |
    +-------------------+
    | Inserted 1 row(s) |
    +-------------------+
    Fetched 1 row(s) in 0.40s
    
    [localhost.localdomain:21000] > invalidate metadata;
    
    [localhost.localdomain:21000] > SELECT * FROM my_temp_table;
    Query: SELECT * FROM my_temp_table
    Query submitted at: 2020-07-28 17:08:17 (Coordinator: http://localhost.localdomain:25000)
    Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=3447684ef59d0c4:f70779200000000
    +-------------------+
    | result            |
    +-------------------+
    | 71.42857142857143 |
    +-------------------+
    Fetched 1 row(s) in 0.74s
    

    【讨论】:

    • 嗨@Chema,和其他人回答一样,这两个SQL仅在SELECT子句中有效,在添加CREATE TABLE后,SQL无法工作。
    • 添加异常日志(两者都报同样的错误):[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error in line 1: CREATE TABLE IF NOT EXISTS my_temp_table AS ^ Encountered: EOF Expected: SELECT, VALUES, W ...
    • 嗨@KD Final,我更改了解决方案,请立即查看。
    • 嗨@KD Final,我用Cloudera 分发和Impala 检查了解决方案,它工作正常。我用分步解决方案更改了帖子。也许您正面临其他问题。问候。
    • 这种通用方法是正确的——将子查询放在 FROM 子句中,然后在选择列表中引用它们。在即将到来的 Impala 4.0(和 Impala 的其他 Cloudera 版本)中,我们确实支持选择列表子查询。在内部,它们被重写为完全像这样的查询。
    【解决方案2】:

    我认为条件平均可以简单高效地完成您想要的操作,只需一次表扫描:

    select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
    from dirty_table
    

    您可以将其转换为create table 声明:

    create table my_temp_table as
    select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
    from dirty_table
    

    【讨论】:

    • 嗨@GMB,测试您的SQL,第一个SELECT 子句有效,但第二个CREATE TABLE 无效。 Cloudera Impala 有很多限制,官方文档中没有详细说明。
    • 添加异常日志:[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error in line 1: create table if not exists my_temp_table as ^ Encountered: EOF Expected: SELECT, VALUES, W ...
    • @KDFinal: 看起来 Impala 不支持 if not exists in create table... 我更改了查询。
    • 我在 Impala 中使用 if not exists 进行测试,它可以工作,我使用这个简单的 sql create table if not exists my_temp_table as select * from dirty_table;,但是当 SELECT 子句中有子查询时,它就失败了。
    猜你喜欢
    • 1970-01-01
    • 2022-01-17
    • 1970-01-01
    • 1970-01-01
    • 2016-12-15
    • 2016-03-14
    • 1970-01-01
    • 2021-07-23
    • 2016-04-09
    相关资源
    最近更新 更多