【问题标题】:Retrieving data from multiple tables - SQL从多个表中检索数据 - SQL
【发布时间】:2017-03-03 12:25:11
【问题描述】:

我有以下表格:

表搜索:

Date        Product    Search_ID
2017-01-01    Nike            101
2017-01-01    Reebok          292
2017-01-01    Nike            103
2017-01-01    Adidas          385
2017-01-02    Nike            284

餐桌购买

Date        Product    Total_sale
2017-01-01    Adidas        4
2017-01-01    Nike          1
2017-01-01    Adidas        2
2017-01-02    Nike          3

每个产品在同一天内可以有多个行。某产品当天的总购买次数 = sum(total_sale)

我需要找到每天每件产品的购买率,即购买次数/搜索次数。

作为参考,对于耐克在2017-01-01上的搜索总数为702,而购买总数为47,购买率为47/702 = 0.0669

我试过了:

select t1.product, sum(t1.Total_sale), count(t2.Search_ID)
from db.purchases t1 join db.searches
on t1.date = t2.date and t1.product = t2.product
where t1.date = '2017-01-01' and t1.product = 'Nike'
group by t1.product, t1.date
;

这给了我一个奇怪的结果:

 product  |  sum  | count 
----------+-------+-------
   Nike   | 32994 | 32994

...我在这里做错了什么?

【问题讨论】:

    标签: sql join hive


    【解决方案1】:

    联接已成倍增加您的结果集,当您删除 GROUP BY 并使用 * 而不是您指定的字段时,您会看到它。

    select * from db.purchases t1 join db.searches
    on t1.date = t2.date and t1.product = t2.product
    where t1.date = '2017-01-01' and t1.product = 'Nike'
    

    你不需要加入表格来计算购买率:

    SELECT     
    (select sum(t1.Total_sale) from db.purchases t1 where t1.date = '2017-01-01' and t1.product = 'Nike')
    /
    (select count(t2.Search_ID) from db.searches t2 where t2.date = '2017-01-01' and t2.product = 'Nike')
    

    【讨论】:

      【解决方案2】:

      在连接之前进行聚合

      select p.product, p.sales, s.searches
      from (select p.date, p.product, sum(p.Total_sale) as sales
            from db.purchases p
            group by p.date, p.product
           ) p join
           (select s.date, s.product, count(*) as searches
            from db.searches s
            group by s.date, s.product
           ) s
           on p.date = s.date and p.product = s.product
      where p.date = '2017-01-01' and p.product = 'Nike';
      

      注意:您可以将where 移动到子查询中,以提高性能。这很容易推广到更多的日子和产品。

      【讨论】:

        【解决方案3】:

        问题是您要连接两个未聚合的表,因此每个“购买”行都与每个“搜索”行连接。因此你的结果是 32994,它来自 702 x 49。

        通过连接达到预期结果的正确方法是

        select  t1.product, t1.total_sales, t2.search_count
        from    (
                  select date, product, sum(total_sales) as total_sales
                  from   db.purchases
                  group by date, product
                ) t1
        join    (
                  select  date, product, count(search_id) as search_count
                  from    db.searches
                  group by date, product
                ) t2
        on      t1.date = t2.date and t1.product = t2.product
        where   t1.date = '2017-01-01' and t1.product = 'Nike'
        group by t1.product, t1.date;
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-10-01
          • 1970-01-01
          • 2019-04-30
          • 1970-01-01
          • 2020-12-15
          • 1970-01-01
          相关资源
          最近更新 更多