【发布时间】:2021-08-14 03:40:17
【问题描述】:
我正在尝试使我的数据集更小,我目前正在从 8 个不同的表中引入数据。为了做到这一点,我想使用 WHERE 子句过滤掉不必要的数据,但我不确定如何对所有 8 个表执行此操作。这是我当前的查询:
--GroupA first, to join the hits and sessions tables
SELECT
GroupA_hits.session_id, GroupA_hits.hits_eventInfo_eventCategory, GroupA_hits.hits_eventInfo_eventAction, GroupA_hits.hits_eventInfo_eventLabel, GroupA_hits.cd126_hit_placeholder,
GroupA_sessions.session_id, GroupA_sessions.userId, GroupA_sessions.fullVisitorId, GroupA_sessions.visitNumber, GroupA_sessions.date,
GroupB_hits.session_id, GroupB_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventLabel, GroupB_hits.cd126_hit_placeholder,
GroupB_sessions.session_id, GroupB_sessions.userId, GroupB_sessions.fullVisitorId, GroupB_sessions.visitNumber, GroupB_sessions.date,
GroupC_hits.session_id, GroupC_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventLabel, GroupC_hits.cd126_hit_placeholder,
GroupC_sessions.session_id, GroupC_sessions.userId, GroupC_sessions.fullVisitorId, GroupC_sessions.visitNumber, GroupC_sessions.date,
GroupD_hits.session_id, GroupD_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventLabel, GroupD_hits.cd126_hit_placeholder,
GroupD_sessions.session_id, GroupD_sessions.userId, GroupD_sessions.fullVisitorId, GroupD_sessions.visitNumber, GroupD_sessions.date
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON (
GroupA_hits.session_id = GroupA_sessions.session_id
)
--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON (
GroupB_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON (
GroupB_sessions.session_id = GroupA_sessions.session_id
)
--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON (
GroupC_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON (
GroupC_sessions.session_id = GroupA_sessions.session_id
)
--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON (
GroupD_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON (
GroupD_sessions.session_id = GroupA_sessions.session_id
)
我还想包括以下子句,这些子句在不同的 _hits 表中都是相同的列名。这是我尝试过的,但我得到了“此查询未返回任何结果”。我认为这是因为编写此查询的方式,BigQuery 正在寻找所有这些都存在于一次命中中的行(这是我的假设),不会有任何。但我希望它浏览这四个表并获取所有匹配的行。
WHERE GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL
AND GroupB_hits.cd126_hit_placeholder Is Not NULL
AND GroupC_hits.cd126_hit_placeholder Is Not NULL
AND GroupD_hits.cd126_hit_placeholder Is Not NULL
【问题讨论】:
-
您必须提供示例数据和所需的输出或制作小提琴:dbfiddle.uk
标签: sql join google-bigquery left-join