【问题标题】:Big Query throwing "Resources exceeded during query execution"Big Query 抛出“查询执行期间超出资源”
【发布时间】:2020-08-20 13:30:44
【问题描述】:

使用 Google BigQuery,我正在使用 group by 运行查询并收到错误消息“查询执行期间资源超出该查询无法在分配的内存中执行。峰值使用量:限制的 152%。顶级内存消耗者(s):用于分析 OVER() 子句的排序操作:99% 其他/未归因:1%"。

我正在使用这个查询 -

    SELECT
  CASE
    WHEN (sourceId = 1 AND web_id IS NOT NULL) THEN LAST_VALUE(Name IGNORE NULLS) OVER (PARTITION BY dgId, web_id ORDER BY event_timestamp ASC)
    WHEN (sourceId IN (2,
      4)
    AND zc IS NOT NULL) THEN coalesce(LAST_VALUE(Name IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC),
    FIRST_VALUE(Name IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))
  ELSE
  Name
END
  AS Name,
  session_id,
  CASE
    WHEN (sourceId = 1 AND web_id IS NOT NULL) THEN LAST_VALUE(user_id IGNORE NULLS) OVER (PARTITION BY dgId, web_id ORDER BY event_timestamp ASC)
    WHEN (sourceId IN (2,
      4)
    AND zc IS NOT NULL) THEN coalesce(LAST_VALUE(user_id IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC),
    FIRST_VALUE(user_id IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))
  ELSE
  user_id
END
  AS user_id
FROM (
  SELECT
    CASE
      WHEN (sourceId = 1 AND web_id IS NOT NULL) THEN FIRST_VALUE(consent_resolved IGNORE NULLS) OVER (PARTITION BY dgId, web_id ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)WHEN (sourceId IN (2, 4) AND zc IS NOT NULL) THEN FIRST_VALUE(consent_resolved IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
    ELSE
    consent_resolved
  END
    AS consent_resolved,
    * EXCEPT(consent_resolved)
  FROM (
    SELECT
      CASE
        WHEN (LOWER(consent) ='no') THEN consent
      ELSE
      NULL
    END
      AS consent_resolved,
      *
    FROM
      `table_name` ))
WHERE
  consent_resolved IS NULL;

有什么建议可以解决这个问题吗?我的 Big Query 表中有 5000 万行。

【问题讨论】:

  • 您可以尝试实现几个 cte,因为您正在使用多个分析函数,而这些函数又使用 ORDER BY 子句。 ORDER BY 子句是内存密集型函数,主要是错误原因。

标签: google-cloud-platform google-bigquery


【解决方案1】:

虽然我没有样本数据来优化您的查询,但我会向您解释重点。

BigQuery 中有一些特定操作要求数据存在于单个节点上。因此,当数据不再适合该节点时,您将收到“查询执行期间超出资源”错误,而 OVER() 就是这些操作之一。如我所见,您的查询执行了大量 OVER()ORDER BY,这也是昂贵的资源(资源方面)。

因此,为了优化您的查询,您可以使用WITH clause 将数据分成碎片。另外,根据documentation,有几点可以让你的查询有更好的表现,比如:

  • 输入数据和数据源 (I/O):您的查询有多少字节 读了吗?

  • 节点之间的通信(洗牌):多少字节 您的查询是否进入下一阶段?

  • 您的查询传递给每个槽的字节数是多少?

  • 计算:您的查询需要多少 CPU 工作?

  • 输出(具体化):您的查询写入了多少字节?

  • 查询反模式:您的查询是否遵循 SQL 最佳实践?

【讨论】:

    猜你喜欢
    • 2013-05-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多