【问题标题】:Rank() Over Partition assigning everything as 1Rank() Over Partition 将所有内容分配为 1
【发布时间】:2020-10-22 11:55:20
【问题描述】:

我正在尝试对数据集进行排名,以确定帐号每天出现的次数,以便我可以根据次数采取行动。

我的数据如下:


+---------------+-----------+-----------+------------------+-----------+---------------+-----------+-----------+-----------+-------------+
| accountnumber | ctry_code | prod_code | comm_file_postdt |  post_dt  | comm_file_pay |  payment  | comm_diff | days_diff | mindue_diff |
+---------------+-----------+-----------+------------------+-----------+---------------+-----------+-----------+-----------+-------------+
|          1234 | MX        | PR        | 6/29/2020        | 6/26/2020 |        -583.5 |    -583.5 |      0.01 |       105 |             |
|          1234 | MX        | PR        | 6/29/2020        | 6/27/2020 |       -443.85 |   -443.85 |      0.01 |       138 |             |
|          1234 | MX        | GL        | 6/30/2020        | 6/26/2020 |      -2783.25 |  -2783.25 |      0.01 |       141 |             |
|          1234 | MX        | OP        | 6/30/2020        | 6/26/2020 |         -4000 |     -4000 |      0.01 |        57 |           0 |
|          1235 | MX        | OP        | 6/29/2020        | 6/27/2020 |      -3794.65 |  -3794.65 |    -35.84 |       102 |         239 |
|          1236 | MX        | OP        | 6/29/2020        | 6/27/2020 |          -239 |      -239 |     35.85 |       102 |      -537.5 |
|          1237 | MX        | OP        | 6/29/2020        | 6/27/2020 |       -345.67 |   -345.67 |    -34.57 |        38 |      345.67 |
|          1238 | MX        | OP        | 6/29/2020        | 6/26/2020 |         -3000 |     -3000 |    371.91 |        63 |     -2479.4 |
|          1238 | MX        | OP        | 6/29/2020        | 6/26/2020 |       -1661.5 |   -1661.5 |      0.01 |        41 |   -11950.16 |
|          1238 | MX        | OP        | 6/29/2020        | 6/27/2020 |     -15466.24 | -15466.24 |  -1091.34 |        12 |    10913.46 |
+---------------+-----------+-----------+------------------+-----------+---------------+-----------+-----------+-----------+-------------+

我要做的是为每个单独的 comm_file_postdt 对每个帐号进行排名。

根据下表,我预期的宁静将是:


+---------------+------------------+------+
| accountnumber | comm_file_postdt | rank |
+---------------+------------------+------+
|          1234 | 6/29/2020        |    1 |
|          1234 | 6/29/2020        |    2 |
|          1234 | 6/30/2020        |    1 |
|          1234 | 6/30/2020        |    2 |
|          1235 | 6/29/2020        |    1 |
|          1236 | 6/29/2020        |    1 |
|          1237 | 6/29/2020        |    1 |
|          1238 | 6/29/2020        |    1 |
|          1238 | 6/29/2020        |    2 |
|          1238 | 6/29/2020        |    3 |
+---------------+------------------+------+

但是,我尝试的每次迭代都获得 Rank 1。

我做了以下事情:

Select *,
rank() over(partition by accountnumber order by comm_file_postdt) as rank from tableA

select*,
rank() over(partition by accountnumber, comm_file_postdt order by post_dt) as rank from tableA

和其他一些一样,但无论我尝试分区和顺序中的任何值组合,我都会将所有内容列为 1。

任何关于我可能做错的指导都会非常有帮助。

【问题讨论】:

    标签: sql date select hive window-functions


    【解决方案1】:

    这是你的代码:

    rank() over(partition by accountnumber order by comm_file_postdt)
    

    您的数据有多行具有相同的accountnumbercomm_file_postdt:这些是平局,因此rank() 为它们分配相同的值。

    最干净的解决方案是使用另一列来打破联系 - 可能是post_dt

    rank() over(partition by accountnumber order by comm_file_postdt, post_dt)
    

    或者你可以使用row_number(),它保证没有重复。但是,如果没有 deterministic order by 子句,则未定义哪些绑定行将排在第一位:这可能是,也可能不是您想要的:

    row_number() over(partition by accountnumber order by comm_file_postdt)
    

    【讨论】:

    • 我认为 row_number() 是我最终要寻找的。我不需要决定因素来打破平局。这样做的最终目标是选择最大“排名”为 1 的帐号。基本上,我试图找到仅在此文件中出现一次的帐户。所以我想我需要使用 row_number 然后选择 Max(Row_number) = 1 的那些。
    【解决方案2】:

    如果您想知道多少次,可能要查找的是 row_number()

    *示例代码*

    ;with mycte as 
    (
    select 
    
    1234 as account_number ,'6/29/2020' as comm_file_postdt     
    union all select 
               1234 ,  '6/29/2020'   
               union all select 
               1234 ,  '6/30/2020'
               union all select 
               1234 ,  '6/30/2020'  
               union all select 
               1235 ,  '6/29/2020'  
               union all select 
               1236 ,  '6/29/2020'  
               union all select 
               1237 ,  '6/29/2020'  
               union all select 
               1238 ,  '6/29/2020'  
               union all select 
               1238 ,  '6/29/2020' 
               union all select 
               1238 ,  '6/29/2020' 
               )
    
               Select *, row_number() over (partition by account_number,comm_file_postdt order by comm_file_postdt) as [rank]
               
               from mycte
    

    结果

    【讨论】:

      猜你喜欢
      • 2016-05-14
      • 1970-01-01
      • 2012-12-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-11-04
      • 2021-11-09
      • 1970-01-01
      相关资源
      最近更新 更多