【问题标题】:ARRAY_AGG not allowed in user-defined function (Standard SQL)用户定义的函数中不允许使用 ARRAY_AGG(标准 SQL)
【发布时间】:2020-11-09 06:45:25
【问题描述】:

在 BigQuery 上处理用户定义的函数以从混乱的数据集中提取电子邮件时,我面临一个问题,即 ARRAY_AGG() 不允许出现在临时用户定义函数 (UDF) 的正文中。

CREATE TEMP FUNCTION GET_EMAIL(emails ARRAY<STRING>, index INT64) AS (
    ARRAY_AGG(
        DISTINCT 
        (SELECT * FROM 
            UNNEST(
                SPLIT(
                    REPLACE(
                        LOWER(
                            ARRAY_TO_STRING(emails, ",")
                        )," ", ""
                    )
                )
            ) AS e where e like '%@%'
        ) IGNORE NULLS
    )[SAFE_OFFSET(index)]
);

SELECT GET_EMAIL(["bob@hotmail.com,test@gmail.com", "12345", "bon@yahoo.com"],1) as email_1

我试图绕过 ARRAY_AGG,方法是从 UNNEST 中选择 OFFSET,然后选择 WHERE 偏移量作为索引。

但是,现在有一个列限制(标量子查询 SELECT 子句中不超过一列)建议改用 SELECT AS STRUCT。

我尝试了 SELECT AS STRUCT:

CREATE TEMP FUNCTION GET_EMAIL(emails ARRAY<STRING>, index INT64) AS (
   
    (SELECT AS STRUCT DISTINCT list.e, list.o FROM 
        UNNEST(
            SPLIT(
                REPLACE(
                    LOWER(
                        ARRAY_TO_STRING(emails, ", ")
                    )," ", ""
                )
            )
        ) AS list
        WITH OFFSET as list.o
        WHERE list.e like '%@%' AND list.o = index)
);

SELECT GET_EMAIL(["bob@hotmail.com,test@gmail.com", "12345", "bob@yahoo.com"],1) as email_1

但它不喜欢我的 DISTINCT 然后即使删除它,它也会抱怨解析 e 和 o。

所以我在这里没有想法,我可能打了一个结。谁能建议如何在 UDF 中完成这项工作?谢谢。

【问题讨论】:

    标签: sql arrays google-bigquery user-defined-functions


    【解决方案1】:

    以下版本有效

    CREATE TEMP FUNCTION GET_EMAIL(emails ARRAY<STRING>, index INT64) AS ((
        SELECT ARRAY(
            SELECT * 
              FROM UNNEST(
                    SPLIT(
                        REPLACE(
                            LOWER(
                                ARRAY_TO_STRING(emails, ",")
                            )," ", ""
                        )
                    )
                ) AS e WHERE e LIKE '%@%'
        )[SAFE_OFFSET(index)]
    ));
    SELECT GET_EMAIL(["bob@hotmail.com,test@gmail.com", "12345", "bon@yahoo.com"], 1) AS email_1
    

    结果

    Row email_1  
    1   test@gmail.com   
    

    或低于版本(这只是对原始查询的轻微修正)

    CREATE TEMP FUNCTION GET_EMAIL(emails ARRAY<STRING>, index INT64) AS ((
      SELECT ARRAY_AGG(e)[SAFE_OFFSET(index)] 
      FROM UNNEST(
            SPLIT(
                REPLACE(
                    LOWER(
                        ARRAY_TO_STRING(emails, ",")
                    )," ", ""
                )
            )
        ) AS e WHERE e LIKE '%@%'
    ));
    SELECT GET_EMAIL(["bob@hotmail.com,test@gmail.com", "12345", "bon@yahoo.com"], 1) AS email_1     
    

    显然结果相同

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-12-29
      • 1970-01-01
      • 2023-02-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多