Plotly.py 桑基图 - 控制节点目的地答案

【问题标题】：Plotly.py Sankey Diagrams - Controlling Node DestinationPlotly.py 桑基图 - 控制节点目的地
【发布时间】：2025-11-22 09:45:01
【问题描述】：

我有一个与之前发布的问题类似的问题：

Plotly: How to set node positions in a Sankey Diagram?

..其中我需要获取以相同字符结尾的所有值，以便在我的桑基图中的同一垂直列中对齐（总共有三个垂直列，我想要（A）在第一个， (B) 在第二个中，和 (C) 在第三个中）。上一篇文章有一个答案，它提供了一个自定义函数，可以将以相同字符结尾的节点分配给相同的目的地，我已经对其进行了修改以适合我的数据集，如下所示：

# Extract list of nodes and list of Source / Target links from my_df DataFrame 

all_nodes = my_df.Source.values.tolist() + my_df.Target.values.tolist()
values = my_df.Value.values.tolist()
source_indices = [all_nodes.index(source) for source in my_df.Source]
target_indices = [all_nodes.index(target) for target in my_df.Target] 
label_names = all_nodes + my_df.Value.values.tolist()
print (label_names)

# Function to assign identical x-positions to label names that have a common ending ((A),(B),(C))

def nodify (node_names):
    node_names = all_nodes 
    # unique name endings 
    ends = sorted(list(set([e[-2] for e in node_names])))
    #intervals 
    steps = 0.5
    # x-values for each unique name ending for input as node position 
    nodes_x = {}
    xVal = 0.5
    for e in ends: 
        nodes_x[str(e)] = xVal
        xVal += steps 
        
    #x and y values in list form
    x_values = [nodes_x[n[-2]] for n in node_names]
    y_values = []
    y_val = 0
    for n in node_names:
        y_values.append(y_val)
        y_val+=.001
    return x_values, y_values 

nodified = nodify(node_names=all_nodes)

# Plot the Sankey Diagram from my_df with node destination control 

fig = go.Figure(data=[go.Sankey(
      arrangement='snap',
      node = dict(
      pad = 8,
      thickness = 10,
      line = dict(color = "black", width = 0.5),
      label = all_nodes,
      color = "blue",
     x=nodified[0],
     y=nodified[1]
    ),

    # Add links
    link = dict(
      source =  source_indices,
      target =  target_indices,
      value =  my_df.Value,
))])

fig.update_layout(title_text= "My Title",
                  font_size=10,
                  autosize=True,
                  height = 2000,
                  width = 2000
                 )
fig.show()

目标分配对我根本不起作用，直到我发现一个开放的 GitHub 问题 (#3002) 这表明 Plotly 不喜欢 x 和 y 坐标设置为 0，所以我将“XVal”更改为从 0.5 开始而不是大于 0，这会将节点目标大部分固定到位，但仍以 (C) 列结尾的四个 (B) 值除外。

我意识到我目前的 'y_val' 仍然从 0 开始，但是当我尝试换成 1e-09 时，一切都会陷入混乱
我已尝试扩展高度/宽度，并将我的节点分桶以减少它们（以防万一这是一个合适的问题），在这两种情况下，我仍然在垂直方向上得到了一些 (B) 值(C) 列。

关于 Plotly 坐标系或节点目标，我有什么遗漏的地方可以帮助我理解为什么 Plotly 会不断地为少数几个节点覆盖我的节点目标分配吗？

感谢任何帮助！

【问题讨论】：

标签： python nodes plotly-python sankey-diagram cartesian-coordinates

【解决方案1】：

您还没有提供示例数据，因此已经构建了一个与您描述的相匹配的生成器
归一化x和y范围需要> 0和
已使用与此答案 plotly sankey graph data formatting 相同的方法从数据帧生成 Sankey

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import itertools

S = 40
labels = [str(p + 1) + s for s, p in itertools.product(list("ABC"), range(5))]
df = pd.DataFrame(
    {
        "source": np.random.choice(labels, S),
        "target": np.random.choice(labels, S),
        "value": np.random.randint(1, 10, S),
    }
)
# make sure paths are valid...
df = df.loc[df["source"].str[-1].apply(ord) < df["target"].str[-1].apply(ord)]
df = df.groupby(["source", "target"], as_index=False).sum()


def factorize(s):
    a = pd.factorize(s, sort=True)[0]
    return (a + 0.01) / (max(a) + 0.1)


# unique nodes
nodes = np.unique(df[["source", "target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))
# work out positioning of nodes
nodes = (
    nodes.to_frame("id")
    .assign(
        x=lambda d: factorize(d.index.str[-1]),
        y=lambda d: factorize(d.index.str[:-1]),
    )
)

# now simple job of building sankey
fig = go.Figure(
    go.Sankey(
        arrangement="snap",
        node={"label": nodes.index, "x": nodes["x"], "y": nodes["y"]},
        link={
            "source": nodes.loc[df["source"], "id"],
            "target": nodes.loc[df["target"], "id"],
            "value": df["value"],
        },
    )
)

fig

生成的数据

source	target	value
1A	3C	7
1B	1C	5
1B	3C	6
2A	4B	12
2B	2C	8
3A	3C	1
3B	1C	8
3B	3C	10
4A	1B	5
4B	2C	9
4B	3C	8
4B	4C	3
5A	1B	1
5A	2C	9
5A	5B	4

【讨论】：

嗨@Rob，感谢您的回复！我已经尝试使用我的 df 应用它，并且所有内容都被绘制为一个长的垂直列。我注意到在“节点”df 中，y 被分配了一个值范围，其中 x 每次都被分配相同的值。这为我的 Sankey 解释了为什么所有内容都显示在同一个 x 向量中，但它似乎在您的测试 df 中运行良好。我想知道在情节中可以显示在单个 x 或 y 向量中的值的数量是否有某种限制？我的 df 上的一些值比生成的数据大得多。..
您能否将数据框中的一些示例数据添加到问题中。我已将此概念应用于定义 x 的日期和定义 y 的类别。所以更长的值都可以工作，但确实需要标记化，而不是仅仅考虑字符串中的最后一个字符

source	target	value
1A	3C	7
1B	1C	5
1B	3C	6
2A	4B	12
2B	2C	8
3A	3C	1
3B	1C	8
3B	3C	10
4A	1B	5
4B	2C	9
4B	3C	8
4B	4C	3
5A	1B	1
5A	2C	9
5A	5B	4

source	target	value
1A	3C	7
1B	1C	5
1B	3C	6
2A	4B	12
2B	2C	8
3A	3C	1
3B	1C	8
3B	3C	10
4A	1B	5
4B	2C	9
4B	3C	8
4B	4C	3
5A	1B	1
5A	2C	9
5A	5B	4

source	target	value
1A	3C	7
1B	1C	5
1B	3C	6
2A	4B	12
2B	2C	8
3A	3C	1
3B	1C	8
3B	3C	10
4A	1B	5
4B	2C	9
4B	3C	8
4B	4C	3
5A	1B	1
5A	2C	9
5A	5B	4