【问题标题】:How to plot a seaborn boxplot for each month and year如何绘制每个月和年的 seaborn 箱线图
【发布时间】:2021-09-29 03:44:41
【问题描述】:

我有一个降水值的时间序列数据框

print(rain_df)
          date  precip
0   2017-01-10     0.0
1   2017-01-17     1.0
2   2017-01-24     1.0
3   2017-01-31     4.0
4   2017-02-07     1.0
..         ...     ...
218 2021-04-27     1.7
219 2021-05-03    22.7
220 2021-05-10     0.0
221 2021-05-17     2.0
222 2021-05-25     0.2
rain_df = rain_df.join(model_data['date'].dt.month.astype(str).str.get_dummies())
rain_df = rain_df.join(rain_df['date'].dt.year.astype(str).str.get_dummies())
rain_df = rain_df[rain_df['precip']>0]
rain_df.reset_index(inplace=True,drop=True)

print(rain_df)
          date  precip  1  10  11  12  2  3  4  5  6  7  8  9  2017  2018  \
0   2017-01-17     1.0  1   0   0   0  0  0  0  0  0  0  0  0     1     0   
1   2017-01-24     1.0  1   0   0   0  0  0  0  0  0  0  0  0     1     0   
2   2017-01-31     4.0  1   0   0   0  0  0  0  0  0  0  0  0     1     0   
3   2017-02-07     1.0  0   0   0   0  1  0  0  0  0  0  0  0     1     0   
4   2017-02-14    22.9  0   0   0   0  1  0  0  0  0  0  0  0     1     0   
..         ...     ... ..  ..  ..  .. .. .. .. .. .. .. .. ..   ...   ...   
175 2021-03-31    18.3  0   0   0   0  0  1  0  0  0  0  0  0     0     0   
176 2021-04-27     1.7  0   0   0   0  0  0  1  0  0  0  0  0     0     0   
177 2021-05-03    22.7  0   0   0   0  0  0  0  1  0  0  0  0     0     0   
178 2021-05-17     2.0  0   0   0   0  0  0  0  1  0  0  0  0     0     0   
179 2021-05-25     0.2  0   0   0   0  0  0  0  1  0  0  0  0     0     0   

     2019  2020  2021  
0       0     0     0  
1       0     0     0  
2       0     0     0  
3       0     0     0  
4       0     0     0  
..    ...   ...   ...  
175     0     0     1  
176     0     0     1  
177     0     0     1  
178     0     0     1  
179     0     0     1 

如何创建一个箱线图,其中 x 轴是月-年,y 轴是 precip 值?

这是我的尝试

# reverse one-hot encoding
rain_df['month-year'] = (rain_df.iloc[:, 2:] == 1).idxmax(1)

rain_df = rain_df.melt(id_vars='month-year',value_vars='precip', value_name='precip')

print(rain_df)
    month-year variable  precip
0            1   precip     1.0
1            1   precip     1.0
2            1   precip     4.0
3            2   precip     1.0
4            2   precip    22.9
..         ...      ...     ...
175          3   precip    18.3
176          4   precip     1.7
177          5   precip    22.7
178          5   precip     2.0
179          5   precip     0.2
ax=sn.boxplot(x='month-year', y='precip', hue='variable', data=rain_df, palette="Set3", linewidth=1)
ax.set_title('Joliette')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

这里的问题是它只在 x 轴上绘制月份,没有关于给定年份的信息。我是不是搞砸了我的 melt 函数或其他什么?

【问题讨论】:

    标签: python pandas datetime seaborn


    【解决方案1】:

    使用dt.strftime 创建您的月份。例如:

    >>> pd.to_datetime(pd.Series(['1918-11-11'])).dt.strftime('%b-%Y')
    0    Nov-1918
    dtype: object
    

    在您的rain_df['date'] 列上执行此操作并分配给month-year。如果这不起作用,您的数据可能不是datetime64 格式。通过在调用 .dt.strftime 之前调用 pd.to_datetime 来修复它。使用新的 month-year 列,再次绘制。

    【讨论】:

      【解决方案2】:

      试试这个,但我自己无法测试。我有点不确定date 的列类型。 melt 没有必要。

      rain_df['month_year'] = rain_df['date'].apply(lambda x: x.strftime('%b %Y')) # e.g. Jul 2021
      
      rain_df = rain_df[rain_df['precip'] > 0][['month_year', 'precip']] # df now consists of these two rows
      
      ax = sn.boxplot(x='month_year', y='precip', data=rain_df, palette="Set3", linewidth=1)
      
      ax.set_title('Joliette')
      ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
      

      【讨论】:

        【解决方案3】:
        • 我认为最简单的解决方案是使用seaborn.boxplot 并将适当的.dt 组件传递给xhue
        • 下面测试df中的'date'datetime dtype
          • 将真正的'date' 转换为datetime dtypedf.date = pd.to_datetime(df.date)

        导入和测试数据帧

        import pandas as pd
        import seaborn as sns
        from calendar import month_abbr as months  # optional
        import numpy as np  # for test data
        
        # test dataframe
        np.random.seed(365)
        rows = 250
        
        dates = pd.bdate_range('2017-01-01', '2021-07-21', freq='D')
        data = {'date': np.random.choice(dates, size=(rows)),
                'precip': np.random.randint(0, 31, size=(rows))}
        
        df = pd.DataFrame(data)
        
        # display(df.head())
                date  precip
        0 2017-01-10     0.0
        1 2017-01-17     1.0
        2 2017-01-24     1.0
        3 2017-01-31     4.0
        4 2017-02-07     1.0
        

        在 x 轴上绘制月份

        # get month names; optional step for renaming the xticklabels
        months = list(months)[1:]
        
        # now just plot the dateframe with seaborn
        fig, ax = plt.subplots(figsize=(15, 7))
        
        sns.boxplot(x=df.date.dt.month, y=df.precip, hue=df.date.dt.year, ax=ax)
        ax.legend(title='Year', bbox_to_anchor=(1, 1), loc='upper left')
        ax.set(xlabel='Month', xticklabels=months)  # setting the xticklabels is optional
        plt.show
        

        在 x 轴上绘制年份

        fig, ax = plt.subplots(figsize=(20, 7))
        
        sns.boxplot(x=df.date.dt.year, y=df.precip, hue=df.date.dt.month, ax=ax)
        ax.legend(title='Month', bbox_to_anchor=(1, 1), loc='upper left')
        ax.set(xlabel='Year')
        plt.show()
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2017-11-17
          • 1970-01-01
          • 2021-03-09
          • 2021-09-08
          • 2021-05-19
          • 2021-09-16
          • 2020-12-08
          • 1970-01-01
          相关资源
          最近更新 更多