【问题标题】:How to get a grouped bar plot of categorical data如何获得分类数据的分组条形图
【发布时间】:2021-10-06 03:32:31
【问题描述】:

我有一个包含学生信息的大型数据集。而且我必须建立一个不同值之间的依赖关系图。例如,我有两列“年份”和“学校”,比如

它创建了一个类似的图形:

我的真实数据

.

我可以过滤我的值,但我不知道如何构建和组合图表。它不起作用...我发现示例看起来像这样,但我无法解决我的问题。

问题是我需要将一列的数据链接到第二列的过滤数据,而不更改数据集。我找不到类似的解决方案。

import pandas as pd


df = pd.read_excel('CREDITATION.xlsx')

plt.title('Depends schools in years')
plt.xlabel('Schools')
plt.ylabel('Counts')
plt.xticks(df['SCHOOL_YEAR_MOBILITY'])
plt.yticks(np.arange(0, 1000, step=100))

bar_width = 0.35



agr2016 = df.loc[(df['SCHOOL'] == 'Escola Superior Agrária de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2016)].count()[0] 
agr2017 = df.loc[(df['SCHOOL'] == 'Escola Superior Agrária de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2017)].count()[0] 
agr2018 = df.loc[(df['SCHOOL'] == 'Escola Superior Agrária de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2018)].count()[0] 
agr2019 = df.loc[(df['SCHOOL'] == 'Escola Superior Agrária de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2019)].count()[0] 
agr2020 = df.loc[(df['SCHOOL'] == 'Escola Superior Agrária de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2020)].count()[0] 

manag2016 = df.loc[(df['SCHOOL'] == 'Escola Superior de Tecnologia e Gestão de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2016)].count()[0]
manag2017 = df.loc[(df['SCHOOL'] == 'Escola Superior de Tecnologia e Gestão de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2017)].count()[0]
manag2018 = df.loc[(df['SCHOOL'] == 'Escola Superior de Tecnologia e Gestão de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2018)].count()[0]
manag2019 = df.loc[(df['SCHOOL'] == 'Escola Superior de Tecnologia e Gestão de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2019)].count()[0]
manag2020 = df.loc[(df['SCHOOL'] == 'Escola Superior de Tecnologia e Gestão de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2020)].count()[0]

com2016 = df.loc[(df['SCHOOL'] == 'Escola Superior de Comunicação, Administração e Turismo')&(df['SCHOOL_YEAR_MOBILITY'] == 2016)].count()[0]
com2017 = df.loc[(df['SCHOOL'] == 'Escola Superior de Comunicação, Administração e Turismo')&(df['SCHOOL_YEAR_MOBILITY'] == 2017)].count()[0]
com2018 = df.loc[(df['SCHOOL'] == 'Escola Superior de Comunicação, Administração e Turismo')&(df['SCHOOL_YEAR_MOBILITY'] == 2018)].count()[0]
com2019 = df.loc[(df['SCHOOL'] == 'Escola Superior de Comunicação, Administração e Turismo')&(df['SCHOOL_YEAR_MOBILITY'] == 2019)].count()[0]
com2020 = df.loc[(df['SCHOOL'] == 'Escola Superior de Comunicação, Administração e Turismo')&(df['SCHOOL_YEAR_MOBILITY'] == 2020)].count()[0]

health2016 = df.loc[(df['SCHOOL'] == 'Escola Superior de Saúde de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2016)].count()[0] 
health2017 = df.loc[(df['SCHOOL'] == 'Escola Superior de Saúde de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2017)].count()[0] 
health2018 = df.loc[(df['SCHOOL'] == 'Escola Superior de Saúde de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2018)].count()[0] 
health2019 = df.loc[(df['SCHOOL'] == 'Escola Superior de Saúde de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2019)].count()[0] 
health2020 = df.loc[(df['SCHOOL'] == 'Escola Superior de Saúde de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2020)].count()[0] 

education2016 = df.loc[(df['SCHOOL'] == 'Escola Superior de Educação de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2016)].count()[0]
education2017 = df.loc[(df['SCHOOL'] == 'Escola Superior de Educação de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2017)].count()[0]
education2018 = df.loc[(df['SCHOOL'] == 'Escola Superior de Educação de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2018)].count()[0]
education2019 = df.loc[(df['SCHOOL'] == 'Escola Superior de Educação de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2019)].count()[0]
education2020 = df.loc[(df['SCHOOL'] == 'Escola Superior de Educação de Bragança')&(df['SCHOOL_YEAR_MOBILITY'] == 2020)].count()[0]

plt.bar() #idk
plt.show()

【问题讨论】:

    标签: python pandas matplotlib seaborn bar-chart


    【解决方案1】:

    导入和示例数据

    import pandas as pd
    import seaborn as sns
    import numpy as np  # for test data only
    
    np.random.seed(365)
    rows = 100
    data = {'year': np.random.choice(range(2016, 2021), size=rows),
            'school': np.random.choice(['a', 'b', 'c', 'd', 'e'], size=rows)}
    df = pd.DataFrame(data)
    
    # display(df.head())
       year school
    0  2018      a
    1  2020      b
    2  2017      b
    3  2019      b
    4  2020      c
    

    seaborn.countplot

    # plot and add annotations
    p = sns.countplot(data=df, x='year', hue='school')
    p.legend(title='School', bbox_to_anchor=(1, 1), loc='upper left')
    
    for c in p.containers:
        # set the bar label
        p.bar_label(c, fmt='%.0f', label_type='edge')
    

    pandas.DataFrame.plot

    dfp = df.pivot_table(index='year', columns='school', values='school', aggfunc='size')
    
    ax = dfp.plot(kind='bar', rot=0)
    
    ax.legend(title='School', bbox_to_anchor=(1, 1), loc='upper left')
    
    for c in ax.containers:
        # set the bar label
        ax.bar_label(c, fmt='%.0f', label_type='edge')
    

    # groupby and pivot
    ax = df.groupby(['year']).school.value_counts().reset_index(name='counts').pivot(index='year', columns='school', values='counts').plot(kind='bar')
    
    # crosstab
    ax = pd.crosstab(df.year, df.school).plot(kind='bar')
    

    【讨论】:

      猜你喜欢
      • 2020-09-11
      • 1970-01-01
      • 2021-10-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多