【问题标题】:Represent intervals within the x axis of histogram in Python在 Python 中表示直方图 x 轴内的区间
【发布时间】:2020-11-25 18:31:29
【问题描述】:

我试图通过 Python 中的直方图来表示 percT 列。下面是我的输入文件:

programName,reqMethID,countT,countN,countU,totalcount,percT,percN,percU
chess,1-9,0,1,0,1,0.0,100.0,0.0
chess,1-16,1,1,0,2,50.0,50.0,0.0
chess,1-4,1,2,0,3,33.33,66.67,0.0
chess,2-9,1,3,0,4,25.0,75.0,0.0
chess,2-16,1,4,0,5,20.0,80.0,0.0
chess,2-4,1,5,0,6,16.67,83.33,0.0
chess,3-9,1,6,0,7,14.29,85.71,0.0
chess,3-16,1,7,0,8,12.5,87.5,0.0
chess,3-4,1,8,0,9,11.11,88.89,0.0
chess,4-9,1,9,0,10,10.0,90.0,0.0
chess,4-16,1,10,0,11,9.09,90.91,0.0
chess,4-4,2,10,0,12,16.67,83.33,0.0
chess,5-9,2,11,0,13,15.38,84.62,0.0
chess,5-16,2,12,0,14,14.29,85.71,0.0
chess,5-4,2,13,0,15,13.33,86.67,0.0
chess,6-9,3,13,0,16,18.75,81.25,0.0
chess,6-16,3,14,0,17,17.65,82.35,0.0
chess,6-4,3,15,0,18,16.67,83.33,0.0
chess,7-9,4,15,0,19,21.05,78.95,0.0
chess,7-16,4,16,0,20,20.0,80.0,0.0
chess,7-4,4,17,0,21,19.05,80.95,0.0
chess,8-9,4,18,0,22,18.18,81.82,0.0
chess,8-16,4,19,0,23,17.39,82.61,0.0
chess,8-4,4,20,0,24,16.67,83.33,0.0
chess,1-10,0,1,0,1,0.0,100.0,0.0
chess,1-17,1,1,0,2,50.0,50.0,0.0
chess,2-10,1,2,0,3,33.33,66.67,0.0
chess,2-17,1,3,0,4,25.0,75.0,0.0
chess,3-10,1,4,0,5,20.0,80.0,0.0
chess,3-17,1,5,0,6,16.67,83.33,0.0
chess,4-10,1,6,0,7,14.29,85.71,0.0
chess,4-17,1,7,0,8,12.5,87.5,0.0
chess,5-10,1,8,0,9,11.11,88.89,0.0
chess,5-17,1,9,0,10,10.0,90.0,0.0
chess,6-10,2,9,0,11,18.18,81.82,0.0

这是我在 Python 中用来以直方图方式表示上述数据的代码:

    dataset = pd.read_csv( 'TNUPercentages.txt', sep= ',', index_col=False) 
X_ticks_array=[i for i in range(0, 100, 10)]
plt.xticks(X_ticks_array)


Tdata= dataset['percT']
print(Tdata.head())
plt.hist(Tdata);
plt.xlabel('Percentages of T')
plt.ylabel('Frequency')
plt.show()

问题是我得到了这张图。 x 轴表示percT 列中的值,y 轴表示这些值的频率。问题是很难区分在 x 轴上具有 0 的数据的频率与在 x 轴上具有 5 或在 x 轴上具有 10 的数据的频率。我希望 x 轴有 11 个 bin,每个 bin 代表以下每个间隔: 0, (0-10], (10,20], (20-30], (30-40], (40-50], (50-60],(60-70], (70-80], (80-90], (90-100],这些区间对应于 percT 列中的值,y 轴应表示数据集中出现此类值的频率。我该怎么做?

【问题讨论】:

    标签: python matplotlib histogram


    【解决方案1】:

    pandas cutvalue_counts 方法在这里会有所帮助:

    fig, ax = pyplot.subplots(figsize=(6, 3.5))
    (
        pandas.cut(data['percT'], bins=numpy.arange(0, 100, 10))
            .value_counts()
            .sort_index()
            .plot.bar(ax=ax)
    )
    

    【讨论】:

    • 0 需要自己的个人垃圾桶
    • @user3406764 您可以通过将调用更改为numpy.arange 来在bins 参数中指定。
    【解决方案2】:

    你的意思是:

    bins=np.arange(0, 100, 10)
    plt.hist(dataset['percT'], bins=bins, edgecolor='w')
    plt.xticks(bins);
    

    输出:


    更新:根据评论:

    bins=np.arange(-10, 100, 10)
    
    (pd.cut(dataset['percT'], bins=bins, labels=bins[1:])
       .astype(int).value_counts()
       .sort_index()
       .plot.bar(align='edge',width=1, edgecolor='w')
    )
    plt.xticks(np.arange(len(bins)),bins);
    

    输出:

    【讨论】:

    • 在这张图中,我们无法区分以0为数据点的数据量与区间]0,10]
    【解决方案3】:

    您可以尝试使用seaborn 以获得更好的可视化效果。它本质上是对 matplotlib 的补充。

    import seaborn as sns
    sns.set_theme()
    dataset = pd.read_csv( 'TNUPercentages.txt', sep= ',', index_col=False) 
    X_ticks_array=[i for i in range(0, 100, 10)]
    plt.xticks(X_ticks_array)
    
    
    Tdata= dataset['percT']
    print(Tdata.head())
    sns.distplot(Tdata,bins=np.arange(0, 100, 10))
    plt.xlabel('Percentages of T')
    plt.ylabel('Frequency')
    plt.show()
    

    如果您不想要分布图线,您可以通过执行以下操作来使用背景网格:

    import seaborn as sns
    sns.set_theme()
    dataset = pd.read_csv( 'TNUPercentages.txt', sep= ',', index_col=False) 
    X_ticks_array=[i for i in range(0, 100, 10)]
    plt.xticks(X_ticks_array)
    
    
    Tdata= dataset['percT']
    print(Tdata.head())
    plt.hist(Tdata,bins=np.arange(0, 100, 10))
    plt.xlabel('Percentages of T')
    plt.ylabel('Frequency')
    plt.show()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-06-22
      • 1970-01-01
      • 2020-08-22
      • 1970-01-01
      • 2015-01-07
      • 2011-08-23
      • 1970-01-01
      • 2022-01-09
      相关资源
      最近更新 更多