【问题标题】:BigQuery - Select multiple columns and want to exclude two double nested columnsBigQuery - 选择多个列并希望排除两个双嵌套列
【发布时间】:2021-11-10 22:54:33
【问题描述】:

您好,我正在处理lt-pcf-analytics-exp.90676036.ga_sessions_* 表,我需要从嵌套命中列中提取不同的变量,包括除hits.customDimensions.valuehits.customDimensions.index 列之外的所有变量。我认为命中和hits.customDimensions 都是ARRAY。如何在标准 SQL 中执行此操作?

我已经发现了一个关于类似问题的问题 (BigQuery except double nested column),但在我的情况下,我有一个双嵌套数组列,我无法调整代码。

基本上,这就是我要提取的内容。如何修改它以便排除hits.customDimensions.valuehits.customDimensions.index?谢谢。

SELECT fullVisitorId,
   visitId,
   visitNumber,
   cd.value as PCF_CUST_ID,
   date,
   TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
   totals.visits as visits,
   totals.hits as total_hits,
   hits.* (EXCEPT hits.customDimensions.value and hits.customDimensions.index)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
          left join unnest(customDimensions) as cd
          left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131' 
   and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber

【问题讨论】:

    标签: select google-bigquery


    【解决方案1】:

    就像@martinus 注意到的那样,您的 except 语法不正确。如果您查看BigQuery Documentation,您会发现运行带有异常的查询的正确方法是:

    SELECT 
       field.* EXCEPT (nested_field1, nested_field2)
    FROM `my_table`
    

    不过,您不能直接在嵌套字段上直接使用EXCEPT。作为一种解决方法,您可以从hits.* 中排除所有hits.customDimensions 值,然后将SELECT 仅用于hits.customDimensions.*,然后排除您需要删除的嵌套元素,例如indexvalue

    如下查询应该可以工作:

    SELECT fullVisitorId,
       visitId,
       visitNumber,
       cd.value as PCF_CUST_ID,
       date,
       TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
       totals.visits as visits,
       totals.hits as total_hits,
       hits.* EXCEPT (hits.customDimensions),
       hits.customDimensions.* EXCEPT (index, value)
    FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
              left join unnest(customDimensions) as cd
              left join unnest(hits) as hits
    WHERE _TABLE_SUFFIX between '20210101' and '20210131' 
       and cd.index = 4 and cd.value is not null
    ORDER BY PCF_CUST_ID, visitStartTime, hitNumber
    

    【讨论】:

      【解决方案2】:

      如果你检查BigQuery documentation for except,这不是好的语法:

      SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
          { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
              [ REPLACE ( expression [ AS ] column_name [, ...] ) ]
          | expression [ [ AS ] alias ] } [, ...]
      

      所以,像这样使用它:

      SELECT hits.* EXCEPT (value, index)
      

      【讨论】:

        猜你喜欢
        • 2017-04-22
        • 2017-08-22
        • 1970-01-01
        • 2017-04-22
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多