【问题标题】:How to use WHERE clauses for a complex JOIN query如何将 WHERE 子句用于复杂的 JOIN 查询
【发布时间】:2021-08-14 03:40:17
【问题描述】:

我正在尝试使我的数据集更小,我目前正在从 8 个不同的表中引入数据。为了做到这一点,我想使用 WHERE 子句过滤掉不必要的数据,但我不确定如何对所有 8 个表执行此操作。这是我当前的查询:

--GroupA first, to join the hits and sessions tables
SELECT 
GroupA_hits.session_id, GroupA_hits.hits_eventInfo_eventCategory, GroupA_hits.hits_eventInfo_eventAction, GroupA_hits.hits_eventInfo_eventLabel, GroupA_hits.cd126_hit_placeholder,
GroupA_sessions.session_id, GroupA_sessions.userId, GroupA_sessions.fullVisitorId, GroupA_sessions.visitNumber, GroupA_sessions.date,
GroupB_hits.session_id, GroupB_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventLabel, GroupB_hits.cd126_hit_placeholder,
GroupB_sessions.session_id, GroupB_sessions.userId, GroupB_sessions.fullVisitorId, GroupB_sessions.visitNumber, GroupB_sessions.date,
GroupC_hits.session_id, GroupC_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventLabel, GroupC_hits.cd126_hit_placeholder,
GroupC_sessions.session_id, GroupC_sessions.userId, GroupC_sessions.fullVisitorId, GroupC_sessions.visitNumber, GroupC_sessions.date,
GroupD_hits.session_id, GroupD_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventLabel, GroupD_hits.cd126_hit_placeholder,
GroupD_sessions.session_id, GroupD_sessions.userId, GroupD_sessions.fullVisitorId, GroupD_sessions.visitNumber, GroupD_sessions.date
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON (
    GroupA_hits.session_id = GroupA_sessions.session_id
)
--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON (
    GroupB_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON (
    GroupB_sessions.session_id = GroupA_sessions.session_id
)
--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON (
    GroupC_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON (
    GroupC_sessions.session_id = GroupA_sessions.session_id
)
--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON (
    GroupD_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON (
    GroupD_sessions.session_id = GroupA_sessions.session_id
) 

我还想包括以下子句,这些子句在不同的 _hits 表中都是相同的列名。这是我尝试过的,但我得到了“此查询未返回任何结果”。我认为这是因为编写此查询的方式,BigQuery 正在寻找所有这些都存在于一次命中中的行(这是我的假设),不会有任何。但我希望它浏览这四个表并获取所有匹配的行。

WHERE GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL
AND GroupB_hits.cd126_hit_placeholder Is Not NULL
AND GroupC_hits.cd126_hit_placeholder Is Not NULL
AND GroupD_hits.cd126_hit_placeholder Is Not NULL 

【问题讨论】:

  • 您必须提供示例数据和所需的输出或制作小提琴:dbfiddle.uk

标签: sql join google-bigquery left-join


【解决方案1】:

考虑将WHERE 条件移动到ON 子句中以在LEFT JOIN 操作期间过滤这些表:

...
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON GroupA_hits.session_id = GroupA_sessions.session_id
AND GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL

--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON GroupB_hits.session_id = GroupA_hits.session_id
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.cd126_hit_placeholder Is Not NULL

LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON  GroupB_sessions.session_id = GroupA_sessions.session_id

--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON GroupC_hits.session_id = GroupA_hits.session_id
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.cd126_hit_placeholder Is Not NULL

LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON GroupC_sessions.session_id = GroupA_sessions.session_id

--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON GroupD_hits.session_id = GroupA_hits.session_id
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.cd126_hit_placeholder Is Not NULL 

LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON GroupD_sessions.session_id = GroupA_sessions.session_id

【讨论】:

    【解决方案2】:

    BigQuery 正在寻找所有这些都存在于一次点击中的行(我的假设),但不会有。

    听起来您希望OR 组中的每个选项,可以进一步简化如下:

    WHERE 
           'rewards' IN (GroupA_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventCategory)
       AND 'redeem points confirm' IN (GroupA_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventAction)
       AND 'gas savings' IN (GroupA_hits.hits_eventInfo_eventLabel, GroupB_hits.hits_eventInfo_eventLabel, GroupC_hits.hits_eventInfo_eventLabel, GroupD_hits.hits_eventInfo_eventLabel)
       AND COALESCE(GroupA_hits.cd126_hit_placeholder, GroupB_hits.cd126_hit_placeholder, GroupC_hits.cd126_hit_placeholder, GroupD_hits.cd126_hit_placeholder) Is Not NULL 
    

    请注意,我对 BigQuery 如何处理 ANSI 标准 SQL 做出了一些假设,因为我不是 BigQuery 的普通用户。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-08-15
      • 2012-06-11
      • 2020-05-05
      • 1970-01-01
      • 2012-07-19
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多