【问题标题】:SQL query to calculate the cumulative number of trips for each dateSQL查询计算每个日期的累计行程次数
【发布时间】:2020-05-06 16:42:24
【问题描述】:

我有一个名为 bikeshare_trips 的配置单元表,其架构如下

+---------------------+------------+----------------------------------------------------+--+
|      col_name       | data_type  |                      comment                       |
+---------------------+------------+----------------------------------------------------+--+
| trip_id             | int        | numeric id of bike trip                            |
| duration_sec        | int        | time of trip in seconds                            |
| start_date          | string     | start date of trip with date and time, in PST      |
| start_station_name  | string     | station name of start station                      |
| start_station_id    | int        | numeric reference for start station                |
| end_date            | string     | end date of trip with date and time, in PST        |
| end_station_name    | string     | station name for end station                       |
| end_station_id      | int        | numeric reference for end station                  |
| bike_number         | int        | id of bike used                                    |
| zip_code            | string     | Home zip code of subscriber (customers can choose to manually enter zip at kiosk however data is unreliable) |
| subscriber_type     | string     | Subscriber can be annual or 30-day member, Customer can be 24-hour or 3-day member |
+---------------------+------------+----------------------------------------------------+--+

还有一些数据示例

944732  2618    09/24/2015 17:22:00 Mezes   83  09/24/2015 18:06:00 Mezes   83  653 94063   Customer
984595  5957    09/24/2015 18:12:00 Mezes   83  10/25/2015 19:51:00 Mezes   83  52  nil Customer
984596  5913    09/24/2015 18:13:00 Mezes   83  10/25/2015 19:51:00 Mezes   83  121 nil Customer
1129385 6079    09/24/2015 10:33:00 Mezes   83  03/18/2016 12:14:00 Mezes   83  208 94070   Customer
1030383 5780    2015-09-30 10:52:00 Mezes   83  12/06/2015 12:28:00 Mezes   83  44  94064   Customer
1102641 801 02/23/2016 12:25:00 Mezes   83  02/23/2016 12:39:00 Mezes   83  174 93292   Customer
969490  255 2015-09-30 19:02:00 Mezes   83  10/13/2015 19:07:00 Mezes   83  650 94063   Subscriber
1129386 6032    03/18/2016 10:33:00 Mezes   83  03/18/2016 12:13:00 Mezes   83  155 94070   Customer
947105  1008    2015-09-30 12:57:00 Mezes   83  09/26/2015 13:13:00 Mezes   83  157 94063   Subscriber
1011650 60  11/16/2015 18:54:00 Mezes   83  11/16/2015 18:55:00 Mezes   83  35  94124   Subscriber

表格的每一行对应不同的自行车行程,我想计算2015年每个日期的累计行程次数。

预期的输出是

trip_date               num_trips                cumulative_trips  
2015-09-24              4                        4                
2015-09-30              3                        7                
2015-11-16              1                        8     

我正在尝试使用分析函数和子查询,但我不明白,任何帮助将不胜感激,在此先感谢

【问题讨论】:

    标签: sql hive aggregate-functions hiveql analytic-functions


    【解决方案1】:

    您可以使用聚合和窗口函数:

    select to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm")) as dte, count(*),
           sum(count(*)) over (order by min(start_date))
    from bikeshare_trips
    where YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015 
    group by to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))
    order by dte;
    

    您可能需要在 Hive 中使用子查询:

    select dte, cnt, sum(cnt) over (order by dte)
    from (select to_date(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm")) as dte, count(*) as cnt           
          from bikeshare_trips
          where YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015 
          group by to_date(start_date)
         ) b
    order by dte;
    

    【讨论】:

    • 它没有用,它给了我一些错误:失败:SemanticException 无法将窗口调用分解为组。至少 1 个组必须仅依赖于输入列。还要检查循环依赖。但是谢谢你的努力。我想我找到了办法。
    • @Chema 。 . .这应该可以正常工作。也许 Hive 需要一个子查询,所以我将其添加进去。
    • 您所做的最后一个查询可以工作,问题是 start_date 因为是字符串时间戳,而 to_date(timestamp) 不起作用,因为时间戳没有正确的格式。我试过这个 TO_DATE(from_unixtime(UNIX_TIMESTAMP('10/25/2015 12:45:00',"MM/dd/yyyy HH:mm"))) ,它似乎正在工作
    • 你错过了条件:WHERE YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(start_date,"MM/dd/yyyy HH:mm"))) = 2015
    • 请修改您的代码,我会批准您的解决方案,谢谢
    【解决方案2】:

    相关子查询可能是这里的一种选择:

    SELECT
        trip_date,
        num_trips,
        (SELECT SUM(t2.num_trips) FROM yourTable t2
         WHERE t2.trip_date <= t1.trip_date) AS cumulative_trips
    FROM yourTable t1
    ORDER BY
        trip_date;
    

    【讨论】:

      猜你喜欢
      • 2021-09-10
      • 2019-11-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多