【问题标题】:Passing ARRAY of STRUCTs into user-defined function for standard BigQuery SQL将 STRUCT 的 ARRAY 传递给标准 BigQuery SQL 的用户定义函数
【发布时间】:2017-01-17 10:53:52
【问题描述】:

如何将一个 STRUCTS 数组传递给我的用户定义函数(使用标准 SQL)?

首先,一点上下文:

表架构:

id STRING
customer STRING
request STRUCT<
  headers STRING
  body STRING
  url STRING
>
response STRUCT<
  size INT64
  body STRING
>
outgoing ARRAY<
  STRUCT<
    request STRUCT<
      url STRING,
      body STRING,
      headers STRING
    >,
    response STRUCT<
      size INT64,
      body STRING
    >
  >
>

用户自定义函数:

CREATE TEMPORARY FUNCTION extractDetailed(
  customer STRING,
  request STRUCT<
    headers STRING,
    body STRING
  >,
  outgoing ARRAY<
    STRUCT<
      request STRUCT<url STRING>,
      response STRUCT<body STRING>
    >
  >
)
RETURNS STRING
LANGUAGE js AS """

""";

SELECT extractDetailed(customer, STRUCT(request.headers, request.body), outgoing)
FROM request_logs

至于我的问题:我似乎无法弄清楚如何选择outgoing ARRAY 的一部分,并将其作为数组传递给用户定义的函数。

实际上,我正在尝试模拟以下用户定义的函数调用:

extractDetailed(
  "customer id",
  { "headers": "", "body": "" },
  [
    {
      "request": { "url": "" },
      "response": { "body": "" }
    },
    {
      "request": { "url": "" },
      "response": { "body": "" }
    }
  ]
);

我最近偶然发现了some documentation that might help 解锁它,我似乎无法弄清楚如何使它适合。我真的在为此苦苦挣扎,如果能帮助我解决这个问题,我将不胜感激。

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    试试下面。它从您的数组中解析所需的和平,并在传递给函数之前将它们放回新数组中,以便匹配 sugnature

    CREATE TEMPORARY FUNCTION extractDetailed(
    customer STRING,
    request STRUCT<headers STRING, body STRING>,
    outgoing ARRAY<STRUCT<request STRUCT<url STRING>, response STRUCT<body STRING>>>
    )
    RETURNS STRING
    LANGUAGE js AS """
    
    """;
    
    SELECT 
      extractDetailed(
        customer, 
        STRUCT(request.headers, request.body), 
        ARRAY(
          SELECT STRUCT<request STRUCT<url STRING>,response STRUCT<body STRING>>
              (STRUCT(request.url), STRUCT(response.body)) 
          FROM UNNEST(outgoing)
        )
      ) AS details
    FROM request_logs  
    

    为了进一步“优化”上述查询并使其更具可移植性,您可以将原始数组中的提取部分包装到新数组中,并将其包装到单独的 SQL UDF 中

    CREATE TEMPORARY FUNCTION extractParts (
      outgoing ARRAY<STRUCT<request STRUCT<url STRING, body STRING, headers STRING>,
                            response STRUCT<size INT64, body STRING>>>
    )
    RETURNS ARRAY<STRUCT<request STRUCT<url STRING>, response STRUCT<body STRING>>>
    AS ((
      SELECT ARRAY(
          SELECT STRUCT<request STRUCT<url STRING>,response STRUCT<body STRING>>
              (struct(request.url), struct(response.body)) 
          FROM UNNEST(outgoing)
        )
    ));
    
    CREATE TEMPORARY FUNCTION extractDetailed(
      customer STRING,
      request STRUCT<headers STRING, body STRING>,
      outgoing ARRAY<STRUCT<request STRUCT<url STRING>, response STRUCT<body STRING>>>
    )
    RETURNS STRING
    LANGUAGE js AS """
      return outgoing.length;
    """;
    
    SELECT 
      extractDetailed(
        customer, 
        STRUCT(request.headers, request.body),
        extractParts(outgoing)
      ) as details
    FROM request_logs
    

    【讨论】:

    • 那么,从性能上看,两者有什么区别吗?我对拥有两个 UDF 的第一个想法是它会导致轻微的减速。但我猜这可能是便携性和性能之间的权衡。
    • 根据我的经验,SQL UDF 并没有增加速度。 JS UDF 可以。所以,我觉得上面两个应该执行相同的
    猜你喜欢
    • 1970-01-01
    • 2018-08-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-08-13
    相关资源
    最近更新 更多