【问题标题】:Plot horizontal duration with pandas用熊猫绘制水平持续时间
【发布时间】:2020-01-14 13:06:02
【问题描述】:

我正在尝试创建一个水平图表来说明流程的持续时间。这是我的示例数据:

一些代码放入 Jupyter Notebook:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dt
import seaborn as sns

df = pd.DataFrame(
    {
    'PROC_NAME': ['data_load', 'data_send', 'data_load', 'data_send', 'data_load', 'data_send', 'data_load', 'data_send'],
    'START_TS': ['2019-06-25 03:30', '2019-06-25 07:15', '2019-06-26 03:30', '2019-06-26 07:19', 
                 '2019-06-26 08:54', '2019-06-27 03:30', '2019-06-27 08:51', '2019-06-28 03:30'],
    'END_TS': ['2019-06-25 03:51', '2019-06-25 07:52', '2019-06-26 03:40', '2019-06-26 07:43', 
               '2019-06-26 09:21', '2019-06-27 04:16', '2019-06-27 09:32', '2019-06-28 04:02']    
    })

df.head()

我想创建一个水平条形图来说明每天的运行持续时间,例如:

[对]

所以它应该有点像甘特图,但每个进程只有一行,一行中有多个条。甘特图会将每个实例放在单独的一行中 - 这不是我想要实现的目标:

[错误]

感谢您的帮助。

【问题讨论】:

    标签: python pandas matplotlib charts


    【解决方案1】:

    知道了!非常感谢@jdhao 这个answer。 (来吧,看看并点赞!)

    这是源数据的代码 - 我添加了更多数据来改进示例:

    Id  | PROC_NAME         | START_TS              | END_TS
    ---------------------------------------------------------------------
    0   | data_load         | 2019-06-25 03:30:00   | 2019-06-25 03:51:00
    1   | data_send         | 2019-06-25 07:15:00   | 2019-06-25 07:52:00
    2   | data_load         | 2019-06-26 03:30:00   | 2019-06-26 03:40:00
    3   | data_send         | 2019-06-26 07:19:00   | 2019-06-26 07:43:00
    4   | data_load         | 2019-06-26 08:54:00   | 2019-06-26 09:21:00
    5   | data_send         | 2019-06-27 03:30:00   | 2019-06-27 04:16:00
    6   | data_load         | 2019-06-27 08:51:00   | 2019-06-27 09:32:00
    7   | data_send         | 2019-06-28 03:30:00   | 2019-06-28 04:02:00
    8   | data_extraction   | 2019-06-25 03:21:00   | 2019-06-25 03:51:00
    9   | data_extraction   | 2019-06-25 06:45:00   | 2019-06-25 07:32:00
    10  | data_extraction   | 2019-06-26 03:30:00   | 2019-06-26 06:40:00
    11  | data_extraction   | 2019-06-26 07:19:00   | 2019-06-26 07:43:00
    12  | data_extraction   | 2019-06-26 10:54:00   | 2019-06-26 11:21:00
    13  | data_extraction   | 2019-06-27 05:30:00   | 2019-06-27 08:16:00
    14  | data_extraction   | 2019-06-27 09:51:00   | 2019-06-27 11:32:00
    15  | data_extraction   | 2019-06-28 02:30:00   | 2019-06-28 04:02:00
    

    这是 Jupyter 的代码:

    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as dt
    
    
    df = pd.DataFrame(
        {
        'PROC_NAME': ['data_load', 'data_send', 'data_load', 'data_send', 'data_load', 'data_send', 'data_load', 'data_send',
                      'data_extraction', 'data_extraction', 'data_extraction', 'data_extraction', 'data_extraction', 'data_extraction', 'data_extraction', 'data_extraction',],
        'START_TS': ['2019-06-25 03:30', '2019-06-25 07:15', '2019-06-26 03:30', '2019-06-26 07:19', 
                     '2019-06-26 08:54', '2019-06-27 03:30', '2019-06-27 08:51', '2019-06-28 03:30',
                     '2019-06-25 03:21', '2019-06-25 06:45', '2019-06-26 03:30', '2019-06-26 07:19', 
                     '2019-06-26 10:54', '2019-06-27 05:30', '2019-06-27 09:51', '2019-06-28 02:30'],
        'END_TS': ['2019-06-25 03:51', '2019-06-25 07:52', '2019-06-26 03:40', '2019-06-26 07:43', 
                   '2019-06-26 09:21', '2019-06-27 04:16', '2019-06-27 09:32', '2019-06-28 04:02',
                   '2019-06-25 03:51', '2019-06-25 07:32', '2019-06-26 06:40', '2019-06-26 07:43', 
                   '2019-06-26 11:21', '2019-06-27 08:16', '2019-06-27 11:32', '2019-06-28 04:02']  
        })
    
    #convert input to datetime:
    df.START_TS = pd.to_datetime(df.START_TS, format = '%Y-%m-%d %H:%M')
    df.END_TS = pd.to_datetime(df.END_TS, format = '%Y-%m-%d %H:%M')
    df.head()
    

    我的问题的解决方案,使用pyplot.hlines

    fig = plt.figure()
    fig.set_figheight(2)
    fig.set_figwidth(15)
    ax = fig.add_subplot(211)
    
    plt.xticks(rotation='25')
    
    #format dates on x axis
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d %H:%M'))
    ax = ax.xaxis_date()
    ax = plt.hlines(df.PROC_NAME,
                    dt.date2num(df.START_TS),
                    dt.date2num(df.END_TS),
                    lw = 10, # make the lines wider and looking more like ribbon
                    color = 'b' # add some color
                   )
    

    最后,我能够清楚地看到运行时间和重叠的结果:

    【讨论】:

      猜你喜欢
      • 2018-08-28
      • 2019-10-19
      • 2014-06-25
      • 2013-07-25
      • 2019-01-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多