【问题标题】:Calculating growth and retention rate with sql使用 sql 计算增长和保留率
【发布时间】:2020-05-26 09:28:48
【问题描述】:

所以我写了一个查询来计算留存率、新学生和返回学生的增长率。下面的代码返回类似这样的结果。

Row  visit_month    student_type    numberofstd  growth 
1      2013          new                574       null
2      2014          new                220       -62%
3      2014        retained             442       245%
4      2015          new                199       -10%
5      2015        retained             533        21%
6      2016          new                214        8%
7      2016        retained             590        11%
8      2016        returning            1         -100%

我尝试过的查询。

with visit_log AS (
    SELECT studentid,
            cast(substr(session, 1, 4) as numeric) as visit_month,
    FROM abaresult
    GROUP BY 1,
             2
    ORDER BY 1,
             2),
  time_lapse_2 AS (
        SELECT studentid,
               Visit_month,
               lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
         FROM visit_log),
  time_diff_calculated_2 AS (
        SELECT studentid,
               visit_month,
               lag,
               visit_month - lag AS time_diff
         FROM time_lapse_2),

  student_categorized AS (
        SELECT studentid,
               visit_month,
               CASE
                        WHEN time_diff=1 THEN 'retained'
                        WHEN time_diff>1 THEN 'returning'
                        WHEN time_diff IS NULL THEN 'new'
               END AS student_type,
    FROM time_diff_calculated_2)

SELECT visit_month,
         student_type,
         Count(distinct studentid) as numberofstd,
         ROUND(100 * (COUNT(student_type) - LAG(COUNT(student_type), 1) OVER (ORDER BY student_type)) / LAG(COUNT(student_type), 1) OVER (ORDER BY student_type),0) || '%' AS growth
  FROM student_categorized
group by 1,2
order by 1,2

上面的查询根据上一期student_type类别的数字计算留存率、新用户率和回头率。

我正在寻找一种方法来根据每个访问月的学生总数而不是每个类别来计算这些数字。有什么办法可以实现吗?

我正在尝试获取与此类似的表格

Row  visit_month    student_type  totalstd  numberofstd  growth 
1      2013          new           574         574       null
2      2014          new           662         220       62%
3      2014        retained        662         442       22%
4      2015          new           732         199       10%
5      2015        retained        732         533       21%
6      2016          new           804         214       8%
7      2016        retained        804         590       11%
8      2016        returning       804         1         100%

注意:

totalstd是每个session的学生总数,由new+retention+returning得到。

假设增长计算。

请帮忙! 谢谢。

【问题讨论】:

    标签: google-bigquery analytics retention


    【解决方案1】:

    虽然我没有您的源数据,但我依靠您自己共享的查询和输出结果。

    我创建了一些额外的代码来输出所需的结果。我想指出我无法访问 BigQuery 的编译,因为我没有数据。因此,我试图自己防止查询出现任何可能的错误。此外,** 之间的查询保持不变,并且是从您的代码中复制而来的。下面是代码(它是你的和我创建的额外代码的混合):

    #*****************************************************************
    with visit_log AS (
        SELECT studentid,
                cast(substr(session, 1, 4) as numeric) as visit_month,
        FROM abaresult
        GROUP BY 1,
                 2
        ORDER BY 1,
                 2),
      time_lapse_2 AS (
            SELECT studentid,
                   Visit_month,
                   lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
             FROM visit_log),
      time_diff_calculated_2 AS (
            SELECT studentid,
                   visit_month,
                   lag,
                   visit_month - lag AS time_diff
             FROM time_lapse_2),
    
      student_categorized AS (
            SELECT studentid,
                   visit_month,
                   CASE
                            WHEN time_diff=1 THEN 'retained'
                            WHEN time_diff>1 THEN 'returning'
                            WHEN time_diff IS NULL THEN 'new'
                   END AS student_type,
        FROM time_diff_calculated_2)
    #**************************************************************
    
    #Code I added
    #each unique visit_month will have its count
    WITH total_stud AS (
    SELECT visit_month, count(distinct studentid) as totalstd FROM visit_log 
    GROUP BY 1
    ORDER BY visit_month
    ),
    
    #After you have your student_categorized temp table, create another one
    #It will have the count of the number of students per visit_month per student_type
    number_std_monthType AS (
    SELECT visit_month,student_type, Count(distinct studentid) as numberofstd from student_categorized
    GROUP BY 1, 2
    ),
    
    #You will have one row per combination of visit_month and student_type
    student_categorized2 AS(
    SELECT DISTINCT visit_month,student_type FROM student_categorized2 
    GROUP BY 1,2
    ),
    
    #before calculation, create the table with all the necessary data
    #you have the desired table without the growth
    #notice that I used two keys to join t1 and t3 so the results are correct
    final_t AS (
    SELECT t1.visit_month, 
           t1.student_type, 
           t2.totalstd as totalstd, 
           t3.numberofstd 
    FROM student_categorized2 t1 
           LEFT JOIN total_stud AS t2 ON t1.visit_month = t2.visit_month
           LEFT JOIN number_std_monthType t3 ON (t1.visit_month = t3.visit_month and t1.student_type = t3.student_type)
    ORDER BY
    )
    
    #Now all the necessary values to calculate growth are in the temp table final_t
    SELECT visit_month, student_type, totalstd, numberofstd,
           ROUND(100 * (totalstd - LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) /LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) || '%' AS growth
    FROM final_t  
    

    请注意,我使用LEFT JOIN 是为了在最终表中获得正确的计数,一旦每个计数在不同的临时表中计算。另外,我没有使用你最后的SELECT 声明。

    如果您对代码有任何问题,请随时提出。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-08-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多