【问题标题】:How to plot a scatter plot with values against a category and colored by a different category如何绘制散点图,其中包含针对某个类别的值并按不同类别着色
【发布时间】:2021-11-06 06:20:46
【问题描述】:

我有一个 Python Pandas 数据框,格式如下:

gender disease1 disease2
male 0.82 0.76
female 0.75 0.93
...... .... ....

我希望在 Python(matplotlib 或 plotly express 等)中绘制它,使其看起来像这样:

如何重构我的数据框和/或使用 python 可视化库来实现此结果?

【问题讨论】:

    标签: python pandas matplotlib plotly seaborn


    【解决方案1】:

    您可以在 Plotly 中创建一个散点图,其中disease1 位于 x=0,disease2 位于 x=1... 等等更多疾病,然后rename the tickmarks,并设置颜色和标记的偏移量取决于性别。

    制作此图的最动态方法是在按疾病和性别对 DataFrame 进行切片时添加数据(我在 DataFrame 中添加了更多点,以证明您可以将 DataFrame 保持在相同的格式并实现所需的情节):

    import pandas as pd
    import plotly.graph_objects as go
    
    df = pd.DataFrame({'gender':['male','female','male','female'],'disease1':[0.82,0.75,0.60,0.24],'disease2':[0.76,0.93,0.51,0.44]})
    
    
    fig = go.Figure()
    offset = {'male': -0.1, 'female': 0.1}
    marker_color_dict = {'male': 'teal', 'female':'pink'}
    
    ## set yaxis range
    values = df[['disease1','disease2']].values.reshape(-1)
    padding = 0.1
    fig.update_yaxes(range=[min(values) - padding, 1.0])
    
    for gender in ['male','female']:
        for i, disease in enumerate(['disease1','disease2']):
            ## ensure that 
            if gender == 'male' and i == 0:
                showlegend=True
            elif gender == 'female' and i == 0:
                showlegend=True
            else:
                showlegend=False
            fig.add_trace(go.Scatter(
                x=[i + offset[gender]]*len(df.loc[df['gender'] == gender, 'disease1'].values), 
                y=df.loc[df['gender'] == gender, disease].values,
                mode='markers',
                marker=dict(color=marker_color_dict[gender], size=20),
                legendgroup=gender,
                name=gender,
                showlegend=showlegend
            ))
    fig.update_layout(
        xaxis = dict(
            tickmode = 'array',
            tickvals = [0.0,1.0],
            ticktext = ['disease1','disease2']
        )
    )
    fig.show()
    

    【讨论】:

      【解决方案2】:
      • 最简单的选择是将seaborn.catplotkind='swarm'kind='strip' 一起使用。
      • 使用pandas.DataFrame.melt 将数据框从宽格式改成长格式,然后进行绘图。
        • 顺便说一句,这只是两行代码,(1) 融化,和 (2) 绘图
      • python 3.8.11pandas 1.3.2matplotlib 3.4.3seaborn 0.11.2中测试
      import pandas as pd
      import numpy as np  # only for sample data
      import seaborn as sns
      
      np.random.seed(365)
      rows = 200
      data = {'Gender': np.random.choice(['Male', 'Female'], size=rows),
              'Cancer': np.random.rand(rows).round(2),
              'Covid-19': np.random.rand(rows).round(2)}
      df = pd.DataFrame(data)
      
      # display(df.head())
         Gender  Cancer  Covid-19
      0    Male    0.82      0.88
      1    Male    0.02      0.95
      2  Female    0.28      0.92
      3  Female    0.55      0.28
      4    Male    0.15      0.46
      
      # convert to long form
      data = df.melt(id_vars='Gender', var_name='Disease')
      
      # display(data.head())
         Gender Disease  value
      0    Male  Cancer   0.82
      1    Male  Cancer   0.02
      2  Female  Cancer   0.28
      3  Female  Cancer   0.55
      4    Male  Cancer   0.15
      
      # plot
      sns.catplot(data=data, x='Disease', y='value', hue='Gender', kind='swarm', palette=['blue', 'pink'], s=4)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2014-03-06
        • 1970-01-01
        • 2021-04-29
        • 1970-01-01
        • 2020-08-16
        • 1970-01-01
        相关资源
        最近更新 更多