【问题标题】:How to set bin coordinates for a scatter plot如何设置散点图的 bin 坐标
【发布时间】:2018-11-21 11:46:54
【问题描述】:

我想返回占据特定区域的散点数。通常,我会使用2dhistogrampcolormesh 来做到这一点。

但是,如果我想设置表示不代表网格的不规则尺寸的 bin 坐标,我该怎么做呢?

以下是我的数据集示例。

import matplotlib.pyplot as plt
import matplotlib as mpl
import math
import numpy as np

x1 = np.random.randint(80, size=(400, 10))
y1 = np.random.randint(80, size=(400, 10))

x2 = np.random.randint(80, size=(400, 10))
y2 = np.random.randint(80, size=(400, 10))

fig, ax = plt.subplots()
ax.grid(False)

plt.scatter(x1[0],y1[0], c = 'r', zorder = 2)
plt.scatter(x2[0],y2[0], c = 'b', zorder = 2)

ang1 = 0, 50
ang2 = 100, 50
angle = math.degrees(math.acos(5.5/9.15))
xy = 50, 50

Halfway = mpl.lines.Line2D((50,50), (0,100), c = 'white')
arc1 = mpl.patches.Arc(ang1, 65, 100, angle = 0, theta2 = angle, theta1 = 360-angle, lw = 2)
arc2 = mpl.patches.Arc(ang2, 65, 100, angle = 0, theta2 = 180+angle, theta1 = 180-angle, lw = 2)
Oval = mpl.patches.Ellipse(xy, 100, 100, lw = 3, alpha = 0.1)

ax.add_line(Halfway)
ax.add_patch(arc1)
ax.add_patch(arc2)
ax.add_patch(Oval)

plt.text(15, 75, '1', fontsize = 8)
plt.text(35, 90, '2', fontsize = 8)
plt.text(65, 90, '3', fontsize = 8)
plt.text(85, 75, '4', fontsize = 8)

ax.autoscale()

plt.draw()

我要设置的垃圾箱标记为 1-4。是否可以设置返回这些 bin 的坐标?

如果我可以设置这些坐标,那么我想返回每个散点所在的 bin。 输出:

更新:

如果我想要一个在散点图中的每一行的每个 bin 中显示 xy 的导出,我会写出 (x1[0], y1[0]) 并将数据转置以返回:

          1             2            3             4   
0  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]

那我把(x1[0], y1[0])改成(x1[1], y1[1])得到第二行数据。

          1             2            3             4   
1  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]

然后我会结合这些来创建:

          1             2            3             4   
0  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]  
1  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]

我有 1000 行,所以我正在尝试创建一种方法来使用整个 (x1, y1) 为每行数据生成每个 bin 中的坐标。

预期输出:

          1             2            3             4   
0  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]
1  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]
2  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]    
3  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]
4  [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)] [(x,y),(x,y)]
5....
6....

如果我尝试(x1, y1),我会收到错误:

err = (arc_vertices[:,0] - x)**2 + (arc_vertices[:,1] - y)**2 ValueError: operands could not be broadcast together with shapes (70,) (10,)

【问题讨论】:

    标签: python pandas matplotlib histogram scatter


    【解决方案1】:

    我对这种方法真的不满意。用数据的 x 坐标计算一个点的 y 坐标落在曲线上的位置似乎更好。

    这种方法的工作原理类似,但使用弧的有限顶点:

    arc1v = ax.transData.inverted().transform(arc1.get_verts())
    arc2v = ax.transData.inverted().transform(arc2.get_verts())
    
    for (x,y) in zip(x1[0], y1[0]):
        err = (arc1v[:,0] - x)**2 + (arc1v[:,1] - y)**2
        nearest = (arc1v[err == min(err)])[0]
        line_x = (x, nearest[0])
        line_y = (y, nearest[1])
        ax.add_line(mpl.lines.Line2D(line_x, line_y))
    
        if x > nearest[0]:
            ax.scatter(x, y, marker='^', s=100, c='k', zorder=1)
        else:
            ax.scatter(x, y, marker='v', s=100, c='k', zorder=1)
    

    此“标签”在(左)曲线的左侧用一个向下的三角形标记,在它的右侧用一个向上的三角形标记。图表上的线条指向曲线上最近定义的顶点,仅供说明之用。

    您也可以对另一条曲线执行此操作,并且 bin 2/3 划分很简单。

    这是一个示例输出图:

    更新:

    这里有一个更完整的答案:

    import matplotlib.pyplot as plt
    import matplotlib as mpl
    import math
    import numpy as np
    
    
    BIN_23_X = 50               # The separator between bin 2 and 3
    
    x1 = np.random.randint(80, size=(400, 10))
    y1 = np.random.randint(80, size=(400, 10))
    
    x2 = np.random.randint(80, size=(400, 10))
    y2 = np.random.randint(80, size=(400, 10))
    
    fig, ax = plt.subplots()
    ax.grid(False)
    
    plt.scatter(x1[0],y1[0], c = 'r', zorder = 2)
    plt.scatter(x2[0],y2[0], c = 'b', zorder = 2)
    
    ang1 = 0, 50
    ang2 = 100, 50
    angle = math.degrees(math.acos(5.5/9.15))
    xy = 50, 50
    
    Halfway = mpl.lines.Line2D((BIN_23_X,BIN_23_X), (0,100), c = 'white')
    arc1 = mpl.patches.Arc(ang1, 65, 100, angle = 0, theta2 = angle, theta1 = 360-angle, lw = 2)
    arc2 = mpl.patches.Arc(ang2, 65, 100, angle = 0, theta2 = 180+angle, theta1 = 180-angle, lw = 2)
    Oval = mpl.patches.Ellipse(xy, 100, 100, lw = 3, alpha = 0.1)
    
    ax.add_line(Halfway)
    ax.add_patch(arc1)
    ax.add_patch(arc2)
    ax.add_patch(Oval)
    
    plt.text(15, 75, '1', fontsize = 8)
    plt.text(35, 90, '2', fontsize = 8)
    plt.text(65, 90, '3', fontsize = 8)
    plt.text(85, 75, '4', fontsize = 8)
    
    # Classification helpers
    def get_nearest_arc_vert(x, y, arc_vertices):
        err = (arc_vertices[:,0] - x)**2 + (arc_vertices[:,1] - y)**2
        nearest = (arc_vertices[err == min(err)])[0]
        return nearest
    
    arc1v = ax.transData.inverted().transform(arc1.get_verts())
    arc2v = ax.transData.inverted().transform(arc2.get_verts())
    
    def classify_pointset(vx, vy):
        bins = {(k+1):[] for k in range(4)}
        for (x,y) in zip(vx, vy):
            nx1, ny1 = get_nearest_arc_vert(x, y, arc1v)
            nx2, ny2 = get_nearest_arc_vert(x, y, arc2v)
    
            if x < nx1:                         # Is this point in bin 1?  To the left of arc1?
                bins[1].append((x,y))
            elif x > nx2:                       # Is this point in bin 4?  To the right of arc2?
                bins[4].append((x,y))
            else:
                # If we get here, the point is in either bin 2 or 3.  We'll consider points
                #   that fall on the line to be in bin 3.
                if x < BIN_23_X:                # Is this point to the left BIN_23_X? => Bin 2
                    bins[2].append((x,y))
                else:                           # Otherwise, the point is in Bin 3
                    bins[3].append((x,y))
    
        return bins
    
    # Classify points
    bins_red  = classify_pointset(x1[0], y1[0])
    bins_blue = classify_pointset(x2[0], y2[0])
    
    # Display classifications
    print("Red:")
    for x in bins_red.items():
        print(" ", x)
    
    print("Blue:")
    for x in bins_blue.items():
        print(" ", x)
    
    # "Annotate" classifications
    for (x,y) in (bins_red[1] + bins_blue[1]):
        ax.scatter(x, y, marker='^', s=100, c='k', zorder=1)
    
    for (x,y) in (bins_red[2] + bins_blue[2]):
        ax.scatter(x, y, marker='v', s=100, c='k', zorder=1)
    
    for (x,y) in (bins_red[3] + bins_blue[3]):
        ax.scatter(x, y, marker='^', s=100, c='y', zorder=1)
    
    for (x,y) in (bins_red[4] + bins_blue[4]):
        ax.scatter(x, y, marker='v', s=100, c='y', zorder=1)
    
    
    ax.autoscale()
    
    plt.draw()
    plt.show()
    

    生产:

    在这里,点被“注释”了,它们后面的形状对应于它们被分类到的 bin:

    宾安诺。彩色三角指点 ------------------------------------------ Bin 1 黑色向上 Bin 2 黑色羽绒 Bin 3 黄色向上 Bin 4 黄色 羽绒

    代码还显示分类结果(classify_pointset 的输出是一个字典,以 bin 编号 (1-4) 为键,值是在 bin 中找到的点的点坐标:

    红色的: (1, [(14, 30), (4, 18), (12, 48)]) (2, [(49, 41)]) (3, [(62, 79), (50, 7), (68, 19), (71, 1), (59, 27), (77, 0)]) (4, []) 蓝色: (1, [(20, 74), (11, 17), (12, 75)]) (2, [(41, 19), (30, 15)]) (3, [(61, 75)]) (4, [(79, 73), (69, 58), (76, 34), (78, 65)])

    您不必以图形方式注释该图,它只是用于说明,您可以使用classify_pointsetbins_redbins_blue)返回的字典。

    更新 2

    以下代码生成列表列表(仍为 1 索引),因此您可以通过访问 all_points[1] 找到 bin 1 中的所有点(红色和蓝色)。 all_points 列表中的第一个元素(索引 0)是 None,因为我们将列表保持为 1 索引。

    # Generate a list of lists, the outer index corresponds to the bin number (1-indexed)
    all_points = [None] * 5
    for bin_key in [1,2,3,4]:
        all_points[bin_key] = bins_red[bin_key] + bins_blue[bin_key]
    
    # Just for display.
    for bin_key, bin_points in enumerate(all_points):
        print(bin_key, bin_points)
    

    输出:

    0 无 1 [(1, 8), (16, 72), (23, 67), (12, 19), (24, 51), (24, 47), (15, 23), (18, 51)] 2 [(39, 75), (35, 27), (48, 55), (45, 53), (45, 22)] 3 [(66, 58), (55, 64), (70, 1), (71, 15), (73, 3), (71, 75)] 4 [(74, 62)]

    【讨论】:

    • 这很好@jedwards。如果我包括 bin 2-3 的划分并复制 bin 4 的代码,我可以从技术上返回任何特定散点的 bin 编号吗?或者这是否必须额外编码
    • 您可以,只需为您的 (x,y) 点重复 for 循环中的逻辑即可。在循环内部,每个数据点由xy 定义,因此应该非常简单。
    • 我可以将第二行显示到 bin 4,但不确定如何拆分 2/3 并实际返回每个点在 @jedwards 中的哪个 bin。不过感谢您的帮助
    • @Maxibon 拆分 2/3 非常简单。请参阅更新后的帖子以获得更完整的答案。
    • 谢谢@jedwards。抱歉,我必须等待 18 小时才能获得赏金。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-27
    • 2019-04-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多