Google Big Query：基于当前行条件的聚合新列答案

【问题标题】：Google Big Query: New Column of Aggregate Based On Condition of Current RowGoogle Big Query：基于当前行条件的聚合新列
【发布时间】：2019-12-04 20:14:02
【问题描述】：

使用 Google Big Query 数据库 bigquery-public-data.crypto_ethereum_classic.transactions 作为参考。

对于每个交易行，我想计算在该交易之前发生在同一地址的所有交易的计数，以及它们的 gas 使用量之和。我确信我可以通过加入来做到这一点，因为我已经尝试过并且 Google 接受了我的旧查询，但是由于（内部）加入导致的数据太多，因此几乎总是出现“超出配额限制”错误。同时，我认为子查询解决方案效率低下，因为它在两个聚合函数中查询几乎相同的东西。

在一个完美的世界中，查询会根据条件（其中 to_address = table_1.to_address 和 block_timestamp

到目前为止，我所拥有的以及我正在寻找的东西类似于...：

SELECT 
    table_1.*,
    COUNT(
        DISTINCT IF(block_timestamp < table_1.block_timestamp and to_address = table_1.to_address, `hash`, NULL)
    ) as txn_count,
    SUM(
        IF(block_timestamp < table_1.block_timestamp and to_address = table_1.to_address, `receipt_gas_used`, NULL)
    ) as total_gas_used
from 
    `bigquery-public-data.crypto_ethereum_classic.transactions` as table_1 
where block_number >= 3000000 and block number <= 3500000 #just to subset the data a bit

【问题讨论】：

标签： sql google-bigquery

【解决方案1】：

我想你想要窗口函数：

select t.*,
       row_number() over (partition by to_address order by block_timestamp) as txn_seqnum,
       sum(receipt_gas_used) over (partition by to_address order by block_timestamp) as total_gas_used
from `bigquery-public-data.crypto_ethereum_classic.transactions` as t 
where block_number >= 3000000 and block number <= 3500000 #just to subset the

如果你真的有联系并且需要独特的，那么使用dense_rank()而不是row_number()。

【讨论】：

谢谢！这主要是有效的。问题是当我增加数据量（block_number = 3500000 和 block_number = 3000000？（除了只移动上限，然后按所需的 block_number 范围过滤掉）。
@javascrub 。 . .你可能想问另一个问题。我认为这回答了您在这里提出的问题。处理资源问题需要更多地了解您的数据，这会使查询复杂化。