【问题标题】:Summarize and plot list of ndarrays according to chosen values根据所选值汇总和绘制 ndarray 列表
【发布时间】:2020-03-04 12:09:53
【问题描述】:

我有一个 ndarrays 列表:

list1 = [t1, t2, t3, t4, t5]

每个 t 包括:

t1 = np.array([[10,0.1],[30,0.05],[30,0.1],[20,0.1],[10,0.05],[10,0.05],[0,0.5],[20,0.05],[10,0.0]], np.float64)

t2 = np.array([[0,0.05],[0,0.05],[30,0],[10,0.25],[10,0.2],[10,0.25],[20,0.1],[20,0.05],[10,0.05]], np.float64)

...

现在我想让整个列表得到每个 t 对应于第一个元素的值的平均值:

t1out = [[0,0.5],[10,(0.1+0.05+0.05+0)/4],[20,(0.1+0.05)/2],[30,0.075]]

t2out = [[0,0.05],[10,0.1875],[20,0.075],[30,0]]

....

在生成 t_1 ... t_n 之后,我想绘制每个 t 的类的概率,其中第一个元素表示类 (0,10,20,30),第二个元素显示其中的概率这些类发生(0.1,0.7,0.15,0)。类似于直方图或条形图形式的概率分布,例如:

plt.bar([classes],[probabilities])

plt.bar([item[0] for item in t1out],[item[1] for item in t1out])

【问题讨论】:

  • 我不明白您是如何生成t1out 等的。您能更好地解释一下吗?另外,t 数组的形状是否相同?
  • 问题是如何生成t1out's =) 请参阅下面如何生成它们的好答案,是的,所有 t 数组的形状都相同

标签: python numpy matplotlib histogram probability-distribution


【解决方案1】:

这是使用 NumPy 计算的方法:

import numpy as np

def mean_by_class(t, classes=None):
    # Classes should be passed if you want to ensure
    # that all classes are in the output even if they
    # are not in the current t vector
    if classes is None:
        classes = np.unique(t[:, 0])
    bins = np.r_[classes, classes[-1] + 1]
    h, _ = np.histogram(t[:, 0], bins)
    d = np.digitize(t[:, 0], bins, right=True)
    out = np.zeros(len(classes), t.dtype)
    np.add.at(out, d, t[:, 1])
    out /= h.clip(min=1)
    return np.c_[classes, out]

t1 = np.array([[10, 0.1 ], [30, 0.05], [30, 0.1 ],
               [20, 0.1 ], [10, 0.05], [10, 0.05],
               [ 0, 0.5 ], [20, 0.05], [10, 0.0 ]],
              dtype=np.float64)
print(mean_by_class(t1))
# [[ 0.     0.5  ]
#  [10.     0.05 ]
#  [20.     0.075]
#  [30.     0.075]]

附带说明,将类值(整数)存储在浮点数组中可能不是最佳选择。您可以考虑改用structured array,例如:

import numpy as np

def mean_by_class(t, classes=None):
    if classes is None:
        classes = np.unique(t['class'])
    bins = np.r_[classes, classes[-1] + 1]
    h, _ = np.histogram(t['class'], bins)
    d = np.digitize(t['class'], bins, right=True)
    out = np.zeros(len(classes), t.dtype)
    out['class'] = classes
    np.add.at(out['p'], d, t['p'])
    out['p'] /= h.clip(min=1)
    return out

t1 = np.array([(10, 0.1 ), (30, 0.05), (30, 0.1 ),
               (20, 0.1 ), (10, 0.05), (10, 0.05),
               ( 0, 0.5 ), (20, 0.05), (10, 0.0 )],
              dtype=[('class', np.int32), ('p', np.float64)])
print(mean_by_class(t1))
# [( 0, 0.5  ) (10, 0.05 ) (20, 0.075) (30, 0.075)]

【讨论】:

    【解决方案2】:

    这是使用itertools.groupby的一种方法:

    from statistics import mean
    from itertools import groupby
    
    def fun(t):
        s = sorted(t, key=lambda x:x[0])
        return [[k, mean(i[1] for i in v)] for k,v in groupby(s, key=lambda x: x[0])]
    
    fun(t1)
    
    [[0.0, 0.5],
     [10.0, 0.05],
     [20.0, 0.07500000000000001],
     [30.0, 0.07500000000000001]]
    

    并应用于所有数组:

    [fun(t) for t in [t1,t2]]
    
    [[[0.0, 0.5],
      [10.0, 0.05],
      [20.0, 0.07500000000000001],
      [30.0, 0.07500000000000001]],
     [[0.0, 0.05], [10.0, 0.1875], [20.0, 0.07500000000000001], [30.0, 0.0]]]
    

    【讨论】:

    • 效果很好。除了列表,它将是[fun(t) for t in list1]
    猜你喜欢
    • 2016-11-12
    • 1970-01-01
    • 1970-01-01
    • 2023-02-23
    • 2014-07-23
    • 1970-01-01
    • 1970-01-01
    • 2019-08-11
    • 1970-01-01
    相关资源
    最近更新 更多