【发布时间】:2018-10-01 20:19:21
【问题描述】:
我正在尝试使用 Postgres 进行一些分析,我确实有 2 个表,分别称为:predictionstate 和 pageviews。
predictionstate 表:
此表包含具有我们算法结果的列,使用以下结构:
- id (
{company_identifier}:{user_identifier}) - 型号(参考字符串值)
- 预测(0.0 到 1.0 之间的浮点数)
pageviews 表:
此表包含用户信息,使用以下结构:
- company_identifier
- user_identifier
- pageview_current_url_type
问题
我正在尝试根据我们的最佳模型获取数据,以分析它的准确性,基本上我需要知道在哪里创建细分并计算我有多少成员。下面的代码就是这样做的:
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
SPLIT_PART(p.id, ':', 1) as company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users"
FROM
ranges r
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
但是我遇到的问题,因为我不知道具体怎么做,所以对于每个(公司、型号、细分市场),需要获取准确度的数据,查询@987654330 @表并识别pageview_current_url_type == 'BUYSUCCESS'。
我试过了,但没用:
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
SPLIT_PART(p.id, ':', 1) as company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users",
b.n as "converted_users"
FROM
ranges r,
(
SELECT COUNT(DISTINCT(pvs.user_identifier)) as n
FROM pageviews pvs
INNER JOIN (
SELECT
SPLIT_PART(id, ':', 1) as company_identifier,
SPLIT_PART(id, ':', 2) as user_identifier
FROM predictionstate ps
WHERE prediction BETWEEN r.r_min AND r.r_max ) users
ON (
pvs.user_identifier = users.user_identifier AND
pvs.company_identifier= users.company_identifier)
WHERE pageview_current_url_type = 'BUYSUCCESS'
) b
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
TL;DR:我需要根据主要查询用户来计算 JOIN。
编辑:
我添加了一个 SQL Fiddle https://www.db-fiddle.com/f/5sQiZD6mHwdnwvVfvL9MAh/0 。
我想知道,对于那些segment_users,有多少人有pageview_current_url_type = 'BUYSUCCESS',在结果中再添加一列:segmented_really_bought。
编辑 2:再一次尝试不起作用(错误:列“p.id”必须出现在 GROUP BY 子句中或用于聚合函数中)
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
SPLIT_PART(p.id, ':', 1) as company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users",
COUNT(b.*) as "converted_users"
FROM
ranges r
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
INNER JOIN (
SELECT users.company_identifier, COUNT(users.user_identifier) AS n
FROM pageviews
INNER JOIN (
SELECT SPLIT_PART(ps.id, ':', 2) AS user_identifier,
SPLIT_PART(ps.id, ':', 1) AS company_identifier
FROM predictionstate ps
WHERE provider_id=47 AND
prediction > 0.7
) users ON (
pageviews.user_identifier=users.user_identifier AND
pageviews.company_identifier=users.company_identifier
)
WHERE pageview_current_url_type='BUYSUCCESS'
GROUP BY users.company_identifier
) AS b
ON (
b.company_identifier = company_identifier
)
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
编辑 3:添加了所需的输出
使用此代码生成:https://gist.github.com/brunoalano/479265b934a67dc02092fb54a846fe1e
company, model, segment, segment_users, really_bought
company_a, model_a, 0.3-0.4, 1, 3
company_a, model_a, 0.5-0.6, 1, 1
company_a, model_b, 0.2-0.3, 1, 3
company_a, model_c, 0.2-0.3, 1, 1
company_a, model_c, 0.7-0.8, 1, 3
company_b, model_a, 0.3-0.4, 3, 2
company_b, model_b, 0.5-0.6, 2, 1
company_b, model_b, 0.6-0.7, 1, 1
company_b, model_c, 0.5-0.6, 1, 0
company_b, model_c, 0.8-0.9, 1, 1
【问题讨论】:
-
1.为什么你的 ID 是一个串联的字符串?如果您将两列作为主键,那么在您的代码中会容易得多。 2. 这看起来很安静。您能否添加一个示例表和预期输出?
-
@S-Man 我在这里创建它:db-fiddle.com/f/5sQiZD6mHwdnwvVfvL9MAh/0
-
您发布的样本的预期结果是什么?请将其添加到您的问题中。
-
@KamilGosciminski 我添加了所需的输出和我用来生成它的代码。很抱歉。
-
我的答案似乎正是您要找的,但我不知道为什么您的输出中的段数少于数据生成的段数。
标签: sql postgresql