【问题标题】:How to filter in the where clause, on a column that is within the select statement and contains count/distinct/case/when如何在 where 子句中过滤 select 语句中包含 count/distinct/case/when 的列
【发布时间】:2017-07-13 13:53:50
【问题描述】:

在 hadoop 上使用 SQL。

我有一个 ID 列表,我试图在其中计算 2 种不同的客人评论数据点的总数。对于guest_review_1,我已退回总数。对于guest_review_2,我将总数分为 5 个范围。

我正在努力的是在 guest_review_1 的 where 子句中设置一个过滤器,其中我不包括总数少于 5 个的属性。

任何解决方法的想法?嵌套的 Select 语句可能吗?

包含以下查询示例:

Select 
id,
count(distinct guest_review_1) as "Guest_Reviews",
count(distinct(case when guest_review_2 < 1 then guest_review_1 end)) as Group1,
Count(distinct(case when guest_review_2 >=2 AND guest_review_2 <3 then guest_review_1 end)) as Group2,
From  table_name
Where
guest_review_2 IS NOT NULL
AND guest_review_1 >=5
AND date BETWEEN '2017-01-01' AND '2017-01-31'
Group By id

【问题讨论】:

    标签: sql hadoop where-clause


    【解决方案1】:

    我不完全确定您的示例查询中 group_1group_2 聚合的含义。但是,您问题的本质似乎是关于如何根据聚合函数 (count) 的结果过滤结果集,而不是过滤单个输入行的值。 Apache Hive 通过使用 SQL HAVING 子句来支持这一点。

    在以下示例中,输入关系包含 6 行 id 设置为 1 和 4 行 id 设置为 2。该查询包括子句HAVING guest_reviews &gt;= 5。由于HAVING 子句,结果集只包含id1 的行。没有将id 设置为2 的输出行。

    WITH table_name AS (
        SELECT 1 AS id, 1 AS guest_review_1, 1 AS guest_review_2 UNION ALL
        SELECT 1 AS id, 2 AS guest_review_1, 2 AS guest_review_2 UNION ALL
        SELECT 1 AS id, 3 AS guest_review_1, 3 AS guest_review_2 UNION ALL
        SELECT 1 AS id, 4 AS guest_review_1, 4 AS guest_review_2 UNION ALL
        SELECT 1 AS id, 5 AS guest_review_1, 5 AS guest_review_2 UNION ALL
        SELECT 1 AS id, 6 AS guest_review_1, 6 AS guest_review_2 UNION ALL
        SELECT 2 AS id, 1 AS guest_review_1, 1 AS guest_review_2 UNION ALL
        SELECT 2 AS id, 2 AS guest_review_1, 2 AS guest_review_2 UNION ALL
        SELECT 2 AS id, 3 AS guest_review_1, 3 AS guest_review_2 UNION ALL
        SELECT 2 AS id, 4 AS guest_review_1, 4 AS guest_review_2
    )
    SELECT
        id,
        count(DISTINCT guest_review_1) AS guest_reviews,
        count(DISTINCT(CASE WHEN guest_review_2 < 1 THEN guest_review_1 END)) AS group_1,
        count(DISTINCT(CASE WHEN guest_review_2 >= 2 AND guest_review_2 < 3 THEN guest_review_1 END)) as group_2
    FROM table_name
    WHERE guest_review_2 IS NOT NULL
    GROUP BY id
    HAVING guest_reviews >= 5
    ;
    

    【讨论】:

      猜你喜欢
      • 2023-01-21
      • 2022-05-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-01-10
      • 1970-01-01
      • 2018-04-14
      相关资源
      最近更新 更多