【问题标题】:Duplicates being generated by left join左连接生成的重复项
【发布时间】:2020-03-04 00:21:20
【问题描述】:

我目前正在尝试重塑表格以按订阅者级别汇总电子邮件指标。 这就是我正在使用的表格的样子:

 SELECT accountid,
       jobid,
       listid,
       batchid,
       subscriberkey,
       eventdate,
       eventtype,
       isunique,
       triggerersenddefinitionobjectid,
       triggeredsendcustomerkey,
       url,
       linkname,
       linkcontent,
       emailid,
       schedtime,
       pickuptime,
       deliveredtime,
       eventid,
       jobtype,
       jobstatus,
       emailname,
       emailsubject,
       sendtype,
       dynamicemailsubject,
       emailsenddefinition
FROM   email_metrics;  

我正在寻找重塑它,以便对于 (subscriberkey + emailid) 的每个独特组合,我有关于他们是否打开同一封电子邮件以及他们是否点击该电子邮件的数据。

当前数据的外观示例(为了简单起见解释我的问题,我将表结构压缩为 3 列,抱歉不知道如何在此处插入表,因此可能看起来令人困惑):

记录示例1:

Subscriberkey | EmailID | Eventtype Open
1234          | 2       | Click
1234          | 2       |          

我希望基本上将其重塑为每个独特的 (SubscriberKey, EmailName) 组合只有一条记录:

SubscriberKey | EmailID2 | Is_Open | Is_Click 
1234          | 2        | True    | True

这将压缩与特定订阅者+电子邮件发送组合相关的所有数据,向我显示一条记录上的相关指标。

我之前能够成功地做到这一点,但我的笔记本电脑最近死了,不幸的是我的脚本无法检索:(

到目前为止,我已经提出以下建议,但是我发现从左连接生成的数据中有重复项,我在理解如何确保这种情况不会发生在我的数据:

WITH email_sent AS (
    SELECT *
    FROM email_metrics em 
    WHERE eventtype ='Sent'
),
    email_open AS (
    SELECT *
    FROM email_metrics em2 
    WHERE eventtype ='Open'
    AND isunique = True),

    email_click AS (
    SELECT * 
    FROM email_metrics em3 
    WHERE eventtype='Click'
    AND isunique = True
)

SELECT DISTINCT a.jobid, 
    a.subscriberkey,
    a.send_time,
    a.emailid,
    a.emailname,
    a.emailsubject,
    a.dynamicemailsubject,
    a.emailsenddefinition,
    a.is_opened,
    a.open_date,
    COALESCE (c.eventtype,'Not Clicked') AS is_click,
    c.eventdate AS click_date,
    c.url,
    c.linkname,
    c.linkcontent
FROM
(SELECT DISTINCT s.jobid,
    s.subscriberkey,
    (s.eventdate) AS send_time,
    s.emailid,
    s.emailname,
    s.emailsubject,
    s.dynamicemailsubject,
    s.emailsenddefinition,
    COALESCE (o.eventtype, 'Not Opened') AS is_opened,
    (o.eventdate) AS open_date
FROM email_sent s 
LEFT JOIN email_open o ON (s.jobid=o.jobid AND s.subscriberkey=o.subscriberkey)) a
LEFT JOIN email_click c ON (a.jobid=c.jobid AND a.subscriberkey=c.subscriberkey);

【问题讨论】:

    标签: sql postgresql group-by pivot


    【解决方案1】:

    我建议只为此使用条件聚合:

    select
        subscriberkey,
        emailid,
        bool_or(eventtype = 'Open') Is_Open,
        bool_or(eventtype = 'Click') Is_Click
    from email_metrics
    group by subscriberkey, emailid
    

    【讨论】:

    • 感谢您的快速回复!我应该使用 max 聚合其他 10-15 列吗?我不相信那是我以前做过的事
    猜你喜欢
    • 2014-07-29
    • 2016-04-16
    • 1970-01-01
    • 2015-08-02
    • 1970-01-01
    • 2012-06-05
    • 2012-02-02
    • 2011-12-22
    相关资源
    最近更新 更多