【发布时间】:2020-10-22 11:55:20
【问题描述】:
我正在尝试对数据集进行排名,以确定帐号每天出现的次数,以便我可以根据次数采取行动。
我的数据如下:
+---------------+-----------+-----------+------------------+-----------+---------------+-----------+-----------+-----------+-------------+
| accountnumber | ctry_code | prod_code | comm_file_postdt | post_dt | comm_file_pay | payment | comm_diff | days_diff | mindue_diff |
+---------------+-----------+-----------+------------------+-----------+---------------+-----------+-----------+-----------+-------------+
| 1234 | MX | PR | 6/29/2020 | 6/26/2020 | -583.5 | -583.5 | 0.01 | 105 | |
| 1234 | MX | PR | 6/29/2020 | 6/27/2020 | -443.85 | -443.85 | 0.01 | 138 | |
| 1234 | MX | GL | 6/30/2020 | 6/26/2020 | -2783.25 | -2783.25 | 0.01 | 141 | |
| 1234 | MX | OP | 6/30/2020 | 6/26/2020 | -4000 | -4000 | 0.01 | 57 | 0 |
| 1235 | MX | OP | 6/29/2020 | 6/27/2020 | -3794.65 | -3794.65 | -35.84 | 102 | 239 |
| 1236 | MX | OP | 6/29/2020 | 6/27/2020 | -239 | -239 | 35.85 | 102 | -537.5 |
| 1237 | MX | OP | 6/29/2020 | 6/27/2020 | -345.67 | -345.67 | -34.57 | 38 | 345.67 |
| 1238 | MX | OP | 6/29/2020 | 6/26/2020 | -3000 | -3000 | 371.91 | 63 | -2479.4 |
| 1238 | MX | OP | 6/29/2020 | 6/26/2020 | -1661.5 | -1661.5 | 0.01 | 41 | -11950.16 |
| 1238 | MX | OP | 6/29/2020 | 6/27/2020 | -15466.24 | -15466.24 | -1091.34 | 12 | 10913.46 |
+---------------+-----------+-----------+------------------+-----------+---------------+-----------+-----------+-----------+-------------+
我要做的是为每个单独的 comm_file_postdt 对每个帐号进行排名。
根据下表,我预期的宁静将是:
+---------------+------------------+------+
| accountnumber | comm_file_postdt | rank |
+---------------+------------------+------+
| 1234 | 6/29/2020 | 1 |
| 1234 | 6/29/2020 | 2 |
| 1234 | 6/30/2020 | 1 |
| 1234 | 6/30/2020 | 2 |
| 1235 | 6/29/2020 | 1 |
| 1236 | 6/29/2020 | 1 |
| 1237 | 6/29/2020 | 1 |
| 1238 | 6/29/2020 | 1 |
| 1238 | 6/29/2020 | 2 |
| 1238 | 6/29/2020 | 3 |
+---------------+------------------+------+
但是,我尝试的每次迭代都获得 Rank 1。
我做了以下事情:
Select *,
rank() over(partition by accountnumber order by comm_file_postdt) as rank from tableA
select*,
rank() over(partition by accountnumber, comm_file_postdt order by post_dt) as rank from tableA
和其他一些一样,但无论我尝试分区和顺序中的任何值组合,我都会将所有内容列为 1。
任何关于我可能做错的指导都会非常有帮助。
【问题讨论】:
标签: sql date select hive window-functions