【问题标题】:Sorting the y-axis of a seaborn lmplot based on duration根据持续时间对 seaborn lmplot 的 y 轴进行排序
【发布时间】:2021-09-06 08:25:32
【问题描述】:

我有以下数据集提取物,我试图用它来绘制 seaborn lmplot。

Case_ID Activity    Timestamp   Cum_Duration
0   1   a   2016-04-15 08:41:28 0.0
1   1   b   2016-04-18 12:55:01 3.0
2   1   d   2016-04-19 07:22:59 4.0
3   1   e   2016-04-23 15:06:58 8.0
4   1   f   2016-04-24 19:18:32 9.0
5   1   g   2016-04-25 14:56:42 10.0
6   1   h   2016-04-26 10:00:36 11.0
7   2   a   2016-04-18 20:40:14 0.0
8   2   b   2016-04-21 22:42:39 3.0
9   2   d   2016-04-24 01:29:27 5.0
10  2   g   2016-04-25 22:36:27 7.0
11  2   e   2016-04-27 16:12:28 9.0
12  2   f   2016-04-28 15:00:35 10.0
13  2   h   2016-05-01 18:32:18 13.0
14  3   a   2016-04-27 01:45:07 0.0
15  3   b   2016-04-27 21:50:32 1.0
16  3   d   2016-04-29 00:12:15 2.0
17  3   g   2016-04-29 16:24:46 3.0
18  3   e   2016-04-30 22:57:03 4.0
19  3   f   2016-05-02 01:33:30 5.0
20  3   h   2016-05-02 11:06:53 5.0
21  4   a   2016-05-02 08:38:34 0.0
22  4   b   2016-05-06 00:50:31 4.0
23  4   d   2016-05-06 17:56:11 4.0
24  4   g   2016-05-13 10:34:23 11.0
25  4   e   2016-05-13 13:56:10 11.0
26  4   f   2016-05-14 23:42:03 13.0
27  4   h   2016-05-17 14:02:28 15.0
28  5   a   2016-05-09 07:17:12 0.0
29  5   b   2016-05-10 06:29:42 1.0
30  5   c   2016-05-11 05:04:34 2.0

所以我使用以下代码绘制了下图。

sns.set_style('whitegrid')
sns.set_context('talk')
relactivity_plot = sns.lmplot(x='Cum_Duration',y='Case_ID', data=rdoa_plot, hue='Activity',height=10, aspect=1.5,fit_reg=False, scatter_kws={'s':150, 'alpha':1.0})
relactivity_plot.set(ylim=(max(rdoa_plot['Case_ID'])+1,0), yticks=(rdoa_plot['Case_ID']).unique(), xlim=(0, max(rdoa_plot['Cum_Duration'])+1))
relactivity_plot.fig.suptitle('Analyzing events timeline for the first 20 events')

Seaborn plot

但是,我希望根据累积持续时间对 y 轴进行排序,使得时间最短的案例位于顶部,持续时间较长的案例如下所示。

Expected output

感谢您的帮助。

【问题讨论】:

    标签: python seaborn data-visualization


    【解决方案1】:

    您可以将“Case_ID”列转换为字符串,然后通过 pandas groupby() 计算它们的顺序,并使用该顺序使“Case_ID”分类。

    这是一些示例代码。 (我将rdoa_plot 重命名为rdoa_df,因为这个名字让我很困惑。我也直接使用了scatterplot,因为在示例中lmplot 似乎被简化为只有散点。)

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    from io import StringIO
    
    data_str = '''Case_ID Activity    Timestamp   Cum_Duration
    0   1   a   "2016-04-15 08:41:28" 0.0
    1   1   b   "2016-04-18 12:55:01" 3.0
    2   1   d   "2016-04-19 07:22:59" 4.0
    3   1   e   "2016-04-23 15:06:58" 8.0
    4   1   f   "2016-04-24 19:18:32" 9.0
    5   1   g   "2016-04-25 14:56:42" 10.0
    6   1   h   "2016-04-26 10:00:36" 11.0
    7   2   a   "2016-04-18 20:40:14" 0.0
    8   2   b   "2016-04-21 22:42:39" 3.0
    9   2   d   "2016-04-24 01:29:27" 5.0
    10  2   g   "2016-04-25 22:36:27" 7.0
    11  2   e   "2016-04-27 16:12:28" 9.0
    12  2   f   "2016-04-28 15:00:35" 10.0
    13  2   h   "2016-05-01 18:32:18" 13.0
    14  3   a   "2016-04-27 01:45:07" 0.0
    15  3   b   "2016-04-27 21:50:32" 1.0
    16  3   d   "2016-04-29 00:12:15" 2.0
    17  3   g   "2016-04-29 16:24:46" 3.0
    18  3   e   "2016-04-30 22:57:03" 4.0
    19  3   f   "2016-05-02 01:33:30" 5.0
    20  3   h   "2016-05-02 11:06:53" 5.0
    21  4   a   "2016-05-02 08:38:34" 0.0
    22  4   b   "2016-05-06 00:50:31" 4.0
    23  4   d   "2016-05-06 17:56:11" 4.0
    24  4   g   "2016-05-13 10:34:23" 11.0
    25  4   e   "2016-05-13 13:56:10" 11.0
    26  4   f   "2016-05-14 23:42:03" 13.0
    27  4   h   "2016-05-17 14:02:28" 15.0
    28  5   a   "2016-05-09 07:17:12" 0.0
    29  5   b   "2016-05-10 06:29:42" 1.0
    30  5   c   "2016-05-11 05:04:34" 2.0'''
    rdoa_df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
    rdoa_df['Case_ID'] = rdoa_df['Case_ID'].astype(str)
    df_max_dur = rdoa_plot.groupby('Case_ID')['Cum_Duration'].max().sort_values()
    case_id_order = df_max_dur.index.astype(str)
    rdoa_df['Case_ID'] = pd.Categorical(rdoa_df['Case_ID'], categories=case_id_order)
    
    sns.set_style('whitegrid')
    sns.set_context('talk')
    fig, ax = plt.subplots(figsize=(15, 10))
    sns.scatterplot(x='Cum_Duration', y='Case_ID', data=rdoa_df, hue='Activity', s=500, alpha=1, ax=ax)
    ax.set_xlim(-0.5, max(rdoa_df['Cum_Duration']) + 0.5)
    ax.set_ylim(len(case_id_order) - 0.5, -0.5)
    for s in ax.spines:
        ax.spines[s].set_visible(False)
    plt.tight_layout()
    plt.show()
    

    要让活动按字母顺序排列,您可以添加hue_order=np.unique(rdoa_df['Activity'])

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2022-08-18
    • 1970-01-01
    • 2017-07-08
    • 2018-08-01
    • 2021-02-22
    • 2020-12-17
    • 2018-05-15
    相关资源
    最近更新 更多