【问题标题】:How to create a categorical bubble plot in Python?如何在 Python 中创建分类气泡图?
【发布时间】:2018-12-18 17:12:38
【问题描述】:

寻求帮助以创建类似于此链接中的情节,仅使用 python 库。
Catagorical Bubble Chart using ggplot2 in R:查看投票最多的回复。

这里我借用了链接中的数据:

    df = pd.DataFrame({'Var1':['Does.Not.apply',
                                'Not.specified',
                    'Active.Learning..general.',
                       'Problem.based.Learning',
                               'Project.Method',
                          'Case.based.Learning',
                                'Peer.Learning',
                                        'Other',
                               'Does.Not.apply',
                                'Not.specified',
                               'Does.Not.apply',
                    'Active.Learning..general.',
                               'Does.Not.apply',
                       'Problem.based.Learning',
                               'Does.Not.apply',
                               'Project.Method',
                               'Does.Not.apply',
                          'Case.based.Learning',
                               'Does.Not.apply',
                                'Peer.Learning',
                               'Does.Not.apply',
                                       'Other'],
                       'Var2':['Does.Not.apply',
                               'Does.Not.apply',
                               'Does.Not.apply',
                               'Does.Not.apply',
                               'Does.Not.apply',
                               'Does.Not.apply',
                               'Does.Not.apply',
                               'Does.Not.apply',
                                'Not.specified',
                                'Not.specified',
                    'Active.Learning..general.',
                    'Active.Learning..general.',
                       'Problem.based.Learning',
                       'Problem.based.Learning',
                               'Project.Method',
                               'Project.Method',
                          'Case.based.Learning',
                          'Case.based.Learning',
                                'Peer.Learning',
                                'Peer.Learning',
                                        'Other',
                                        'Other'],
                        'Count' : [53,15,1,2,4,22,6,1,15,15,1,1,2,2,4,4,22,22,6,6,1,1]})

【问题讨论】:

    标签: python r matplotlib plot bubble-chart


    【解决方案1】:

    Plotnine是基于r的ggplot2的图形python实现语法。

    代码与您的 R 链接中的代码几乎相同。

    import math
    import pandas as pd
    from plotnine import *
    
    df = pd.DataFrame(<dataframe data here>)
    
    df['dotsize'] = df.apply(lambda row: math.sqrt(float(row.Count) / math.pi)*7.5, axis=1)
    
    (ggplot(df, aes('Var1', 'Var2')) + \
           geom_point(aes(size='dotsize'),fill='white') + \
           geom_text(aes(label='Count'),size=8) + \
           scale_size_identity() + \
           theme(panel_grid_major=element_line(linetype='dashed',color='black'),
                 axis_text_x=element_text(angle=90,hjust=1,vjust=0))
    ).save('mygraph.png')
    

    【讨论】:

      【解决方案2】:

      Python 原生的matplotlib 当然可以创建这种图。它只是一个具有可变标记大小的分类散点图。使用您的玩具数据集:

      import numpy as np
      import matplotlib.pyplot as plt
      import pandas as pd
      
      #create markersize column from values to better see the difference
      #you probably want to edit this function depending on min, max, and range of values
      df["markersize"] = np.square(df.Count) + 10
      fig = plt.figure()
      #plot categorical scatter plot
      plt.scatter(df.Var1, df.Var2, s = df.markersize, edgecolors = "red", c = "white", zorder = 2)
      #plot grid behind markers
      plt.grid(ls = "--", zorder = 1)
      #take care of long labels
      fig.autofmt_xdate()
      plt.tight_layout()
      plt.show()
      

      输出:

      关于散点图的标记大小函数的定义,you might want to read this answer.

      【讨论】:

        【解决方案3】:

        解决此问题的另一种方法是 plot an annotation 在每个分类点处使用值和围绕它的圆圈:

        import numpy as np
        import matplotlib.pyplot as plt
        import pandas as pd
        
        #create padding column from values for circles that are neither too small nor too large
        df["padd"] = 2.5 * (df.Count - df.Count.min()) / (df.Count.max() - df.Count.min()) + 0.5
        fig = plt.figure()
        #prepare the axes for the plot - you can also order your categories at this step
        s = plt.scatter(sorted(df.Var1.unique()), sorted(df.Var2.unique(), reverse = True), s = 0)
        s.remove
        #plot data row-wise as text with circle radius according to Count
        for row in df.itertuples():
            bbox_props = dict(boxstyle = "circle, pad = {}".format(row.padd), fc = "w", ec = "r", lw = 2)
            plt.annotate(str(row.Count), xy = (row.Var1, row.Var2), bbox = bbox_props, ha="center", va="center", zorder = 2, clip_on = True)
        
        #plot grid behind markers
        plt.grid(ls = "--", zorder = 1)
        #take care of long labels
        fig.autofmt_xdate()
        plt.tight_layout()
        plt.show()
        

        样本输出:

        感谢 DavidG,他向我展示了 in this answer 如何防止注释打印在图表之外。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2021-05-23
          • 2020-04-04
          • 2018-06-28
          • 2021-09-20
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多