【发布时间】:2018-11-12 03:17:44
【问题描述】:
我有一个价格表,其中有两个主要变量:日期 (sales_date) 和销售渠道 (channel)。我需要为所有可能的 skus (ean) 和客户 (id_client) 的组合填补这些空白。
此时我已经能够填写日期和频道,但在某些情况下,多个频道会在同一日期共享,在那些“奇怪”的情况下,我的方法是复制所有内容。
表格
create table prices_master (
id_price serial primary key,
sales_date date,
ean varchar(15),
id_client int,
channel varchar(15),
price float
);
create table channels_master (
id_channel serial primary key,
channel varchar(15)
);
insert into prices_master (sales_date, ean, id_client, channel, price)
values
('2015-07-01', '7506205801143', 7, 'COMERCIAL',47655),
('2015-08-01', '7506205801143', 7, 'COMERCIAL',51655),
('2015-12-01', '7506205801143', 7, 'COMERCIAL', 55667),
('2015-12-01', '7506205801143', 7, 'DISTRIBUIDOR', 35667),
('2015-07-01', '5052197008555', 7, 'DISTRIBUIDOR', 7224),
('2015-10-01', '5052197008555', 7, 'DISTRIBUIDOR', 8224);
insert into channels_master (channel) values
('DISTRIBUIDOR'), ('INSTITUCIONAL'), ('NON_TRADE'), ('COMERCIAL');
我的方法
WITH full_dates AS (
WITH min_max AS (
SELECT min(prm.sales_date) AS min_date, ((max(prm.sales_date))) :: date AS max_date
FROM prices_master prm
)
SELECT generate_series((min_max.min_date) :: timestamp with time zone,
(min_max.max_date) :: timestamp with time zone, '1 mon' :: interval) AS sales_date
FROM min_max),
completechannels AS (
SELECT DISTINCT channel
FROM channels_master
),
temp AS (
SELECT prices_master.sales_date,
prices_master.id_client,
prices_master.ean,
prices_master.channel,
prices_master.price,
lead(
prices_master.sales_date) OVER (PARTITION BY prices_master.id_client, prices_master.ean, prices_master.channel ORDER BY prices_master.sales_date) AS next_sales_date
FROM prices_master
ORDER BY prices_master.id_client, prices_master.ean, prices_master.channel, prices_master.sales_date
)
SELECT (full_dates.sales_date) :: date AS sales_date,
temp.id_client,
temp.ean,
completechannels.channel,
price
FROM full_dates
JOIN temp ON full_dates.sales_date >= temp.sales_date AND
(full_dates.sales_date < temp.next_sales_date OR temp.next_sales_date IS NULL)
JOIN completechannels ON 1=1
ORDER BY temp.id_client, temp.ean, completechannels.channel,
full_dates.sales_date;
我的问题出现在 sales_date 2015-12-01 的代码 7506205801143 上,因为此代码具有 DISTRIBUIDOR 和 COMERCIAL 两个渠道的价格,所以我的方法是复制行:
我的方法结果(不好)
+------------+-----------+---------------+---------------+-------+
| sales_date | id_client | ean | channel | price |
+------------+-----------+---------------+---------------+-------+
| 2015-12-01 | 7 | 7506205801143 | COMERCIAL | 55667 |
| 2015-12-01 | 7 | 7506205801143 | COMERCIAL | 35667 |
| 2015-12-01 | 7 | 7506205801143 | DISTRIBUIDOR | 55667 |
| 2015-12-01 | 7 | 7506205801143 | DISTRIBUIDOR | 35667 |
| 2015-12-01 | 7 | 7506205801143 | INSTITUCIONAL | 35667 |
| 2015-12-01 | 7 | 7506205801143 | INSTITUCIONAL | 55667 |
| 2015-12-01 | 7 | 7506205801143 | NON_TRADE | 55667 |
| 2015-12-01 | 7 | 7506205801143 | NON_TRADE | 35667 |
+------------+-----------+---------------+---------------+-------+
预期结果(好)
+------------+-----------+---------------+---------------+-------+
| sales_date | id_client | ean | channel | price |
+------------+-----------+---------------+---------------+-------+
| 2015-12-01 | 7 | 7506205801143 | COMERCIAL | 55667 |
| 2015-12-01 | 7 | 7506205801143 | DISTRIBUIDOR | 35667 |
| 2015-12-01 | 7 | 7506205801143 | INSTITUCIONAL | 55667 |
| 2015-12-01 | 7 | 7506205801143 | NON_TRADE | 55667 |
+------------+-----------+---------------+---------------+-------+
对于 INSTITUTIONAL 和 NON_TRADE,最高价格可用于填补空白。
【问题讨论】:
标签: sql postgresql gaps-and-islands