【发布时间】:2020-06-24 15:57:43
【问题描述】:
我想更改现有查询并收到以下错误:
不支持的子查询表达式“已删除”:仅允许作为顶级合取的子查询表达式
现有查询是:
SELECT DISTINCT
*
FROM
geoposition_import AS geo
-- do not take into account data for deleted users
WHERE
EXISTS (
SELECT 1
FROM geoposition_import_users AS u
WHERE u.id = geo.userId
AND NOT u.deleted
);
在我们更改之后,geoposition_import 中的 userId 可以为空,因为现在机器也可以创建地理位置。所以我将查询更改为
SELECT DISTINCT
*
FROM
geoposition_import AS geo
-- do not take into account data for deleted users
WHERE
geo.userId IS NULL -- data from non users (e.g. machines) is still fine
OR
EXISTS (
SELECT 1
FROM geoposition_import_users AS u
WHERE u.id = geo.userId
AND NOT u.deleted
);
得到了上面提到的错误。
我用谷歌搜索并找到了限制:https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-hiveql/content/hive_hive_subquery_limitations.html
所以我的猜测是:OR 是问题所在。
现在我的问题:
- 为什么错误消息表明“删除”是问题所在?
- 如何重写查询以使其正常工作?
我想到的唯一解决方案是:将条件拆分为单独的视图,然后执行UNION ALL。
喜欢:
CREATE VIEW IF NOT EXISTS geoposition_import_from_non_users AS
SELECT DISTINCT
*
FROM
geoposition_import AS geo
WHERE
geo.userId IS NULL;
CREATE VIEW IF NOT EXISTS geoposition_import_from_users AS
SELECT DISTINCT
*
FROM
geoposition_import AS geo
-- do not take into account data for deleted users
WHERE
EXISTS (
SELECT 1
FROM geoposition_import_users AS u
WHERE u.id = geo.userId
AND NOT u.deleted
);
-- staged data with possible duplicates removed
CREATE VIEW IF NOT EXISTS geoposition_import_distinct AS
SELECT * FROM geoposition_import_from_non_users
UNION ALL
SELECT * FROM geoposition_import_from_users;
有没有cmets?
【问题讨论】:
-
u.deleted 列类型是什么?或者此列中的数据是什么
-
@leftjoin 布尔值。正如我所说:多年来没有空检查的第一个版本运行良好:)