【问题标题】:JS UDF with partition of input data from bigquery standard sql带有来自 bigquery 标准 sql 的输入数据分区的 JS UDF
【发布时间】:2018-04-04 22:16:25
【问题描述】:

我正在尝试将两组现金流(按字段“id”分区)传递给 js udf IRRCalc,并为每个现金流集计算一个 IRR 数。

  CREATE TEMPORARY FUNCTION IRRCalc(cash_flow ARRAY<FLOAT64>, date_delta ARRAY<INT64>)
    RETURNS FLOAT64
    LANGUAGE js AS """
      min = 0.0;
      max = 100.0;
      iter_cnt = 0;
      do {
        guess = (min + max) / 2;
        NPV = 0.0;
        for (var j=0; j<cash_flow.length; j++){
          NPV += cash_flow[j]/Math.pow((1+guess),date_delta[j]/365);
        }
        if (cash_flow[0] > 0){
          if (NPV > 0){
            max = guess;
          }
          else {
            min = guess;
          }
        }
        if (cash_flow[0] < 0){
          if (NPV > 0){
            min = guess;
          }
          else {
            max = guess;
          }
        }
        iter_cnt = iter_cnt+1;
      } while (Math.abs(NPV) > 0.00000001 && iter_cnt<8192);
      return guess;

    """;
WITH Input AS
 (
  select
    id,
    scenario_date,
    cash_flow_date,
    date_diff(cash_flow_date, min(cash_flow_date) over (partition by id),day) as date_delta,
    sum(cash_flow) as cash_flow
  from cash_flow_table
  where id in ('1','2')
  group by 1,2,3
  order by 1,2,3
 )

 select 
    id, 
    IRRCalc(array(select cash_flow from input), array(select date_delta from input)) as IRR
 from input
 group by 1

输入数据:

Row id  scenario_date   cash_flow_date  date_delta  cash_flow    
1   1   2018-04-02  2016-07-01  0   5979008.899131917    
2   1   2018-04-02  2016-08-03  33  -2609437.0145417987  
3   1   2018-04-02  2016-08-29  59  -21682.04267909576   
4   1   2018-04-02  2016-09-16  77  -4968554.060201097   
5   1   2018-04-02  2018-04-02  640 0.0  
6   2   2018-04-02  2017-09-08  0   -320912.83786916407  
7   2   2018-04-02  2017-09-27  19  3015.2821677139805   
8   2   2018-04-02  2018-03-28  201 3204.6920948425554   
9   2   2018-04-02  2018-04-02  206 440424.3826431843    

理想情况下,我期望输出表如下:

Row id  IRR  
1   1   3.2
2   2   0.8 

但是,我最终在输出表下方:

Row id  IRR  
1   1   3.8
2   2   3.8 

我认为问题出在我调用 IRRCalc 时,所有内容都放在一个数组中,而不是按 id 分区。如果你在下面运行,你会明白我的意思:

 select 
    array(select cash_flow from input), 
    array(select date_delta from input)
 from input

而不是IRRCalc(array(select cash_flow from input), array(select date_delta from input))。有人可以看看,让我知道如何在两个数组 cash_flow 和 date_delta 上应用 partition by id 逻辑,然后再将其传递给函数 IRRCalc?

【问题讨论】:

    标签: javascript google-bigquery user-defined-functions


    【解决方案1】:

    下面是你要找的最外层的选择语句

    SELECT 
      id, 
      IRRCalc(ARRAY_AGG(cash_flow), ARRAY_AGG(date_delta)) AS IRR
    FROM input
    GROUP BY id 
    

    它按 id 分组并形成传递给您的 UDF 的相应数组 - 所以结果是特定于 id 的
    假设 WITH input AS 的逻辑是正确的 - 你应该得到预期的结果

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-08-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-02-20
      • 1970-01-01
      相关资源
      最近更新 更多