【发布时间】:2019-02-14 05:17:50
【问题描述】:
我正在尝试使用 PARTITION BY 子句和 ARRAY_AGG() 函数将列折叠成数组。
我在 Big Query 中的标准 SQL 如下:
WITH initial_30days
AS (
SELECT
date,
fullvisitorId AS user_id,
visitNumber,
CONCAT(fullvisitorid, CAST(VisitId AS STRING)) AS session_id
FROM
`my-data.XXXXXXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181004' AND '20181103'
GROUP BY 1,2,3,4
)
SELECT
date,
ARRAY_AGG(sessions) OVER (PARTITION BY date ROWS BETWEEN 5 PRECEDING
AND CURRENT ROW) AS agg_array
FROM(
SELECT
date,
user_id,
COUNT(DISTINCT( session_id)) AS sessions
FROM initial_30days
GROUP BY date,user_id)
GROUP BY date,sessions
我的预期输出是:
+----------+--------------------------+
| date | agg_array |
+----------+--------------------------+
| 20181004 | [34,21,34,21,6,7,4,43] |
| 20181005 | [1,5,56,76,23,1,3,54,45] |
| 20181006 | [22,67,43,1,2,67,3,24] |
| 20181007 | [34,21,34,21,6,7,4,43] |
+----------+--------------------------+
我当前的输出看起来像这样,以一个日期值为例:
+----------+------------------------+
| date | agg_array |
+----------+------------------------+
| 20181004 | [34] |
| 20181004 | [34,21] |
| 20181004 | [34,21,34] |
| 20181004 | [34,21,34,21] |
| 20181004 | [34,21,34,21,6] |
| 20181004 | [34,21,34,21,6,7] |
| 20181004 | [34,21,34,21,6,7,4] |
| 20181004 | [34,21,34,21,6,7,4,43] |
+----------+------------------------+
您可以看到按日期分区的数组为该数组的每个值创建了一个增量行。
ARRAY_AGG() 函数应用的数据集如下所示:
+----------+------------------+----------+
| date | user_id | sessions |
+----------+------------------+----------+
| 20181004 | 2526262363754747 | 34 |
| 20181004 | 2525626325173256 | 21 |
| 20181004 | 7436783255747736 | 34 |
| 20181004 | 6526241526363536 | 21 |
| 20181004 | 4252636353637423 | 6 |
| 20181004 | 3636325636673563 | 7 |
+----------+------------------+----------+
我感觉它是因为我按上面的sessions 分组,但那是因为如果我不这样做,我会收到类似的验证错误:
SELECT list expression references column sessions which is
neither grouped nor aggregated at
【问题讨论】:
标签: sql google-bigquery