【发布时间】:2019-08-13 13:00:57
【问题描述】:
这听起来可能是一个非常广泛的问题,但如果您让我描述一些细节,我可以向您保证它非常具体。以及令人沮丧、沮丧和激怒的情绪。
以下情节描述了一场苏格兰选举,并基于来自plot.ly 的代码:
情节 1:
数据集 1:
data = [['Source','Target','Value','Color','Node, Label','Link Color'],
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
[1,5,14,'#7FC241','Leave+Yes – 14','rgba(219, 233, 246, 0.5)'],
[1,6,1,'#D3D3D3','Didn’t vote in at least one referendum – 21','rgba(73, 148, 206, 1)'],
[1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)'],
[2,5,3,'#449E9E','39 – Yes','rgba(250, 188, 19, 1)'],
[2,6,17,'#D3D3D3','14 – Don’t know / would not vote','rgba(250, 188, 19, 0.5)'],
[2,7,2,'','','rgba(250, 188, 19, 0.5)'],
[3,5,3,'','','rgba(127, 194, 65, 1)'],
[3,6,9,'','','rgba(127, 194, 65, 0.5)'],
[3,7,2,'','','rgba(127, 194, 65, 0.5)'],
[4,5,5,'','','rgba(211, 211, 211, 0.5)'],
[4,6,9,'','','rgba(211, 211, 211, 0.5)'],
[4,7,8,'','','rgba(211, 211, 211, 0.5)']
]
情节是如何构建的:
我从各种来源收集了一些关于桑基图行为的重要细节,例如:
Sankey automatically orders the categories to minimize the amount of overlap
Links are assigned in the order they appear in dataset (row_wise)
For the nodes colors are assigned in the order plot is built.
挑战:
正如您将在下面的详细信息中看到的那样,节点、标签和颜色不会以与源数据框的结构相同的顺序应用于图表。 其中一些 是完美的,因为您有各种元素来描述相同的节点,如颜色、目标、值和链接颜色。一个节点'Remain+No – 28' 如下所示:
数据集的随附部分如下所示:
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
所以这部分源代码描述了一个节点[0],具有三个对应的目标[5, 6, 7]和三个值为[20, 3, 5]的链接。 '#F27420' 是节点的橙色(ish)颜色,颜色 'rgba(253, 227, 212, 0.5)'、'rgba(242, 116, 32, 1)' 和 'rgba(253, 227, 212, 0.5)' 描述了从节点到某些目标的链接的颜色。到目前为止,上面示例中没有用到的信息是:
数据样本 2(部分)
[-,-,--'-------','---------------','-------------------'],
[-,-,-,'#4994CE','Leave+No – 16','-------------------'],
[-,-,-,'#FABC13','Remain+Yes – 21','-------------------'],
并且该信息被用作图表的其余元素被引入。
那么,问题是什么?在下面的详细信息中,您将看到,只要数据集中的新数据行插入新链接,并对其他元素(颜色、标签)进行其他更改(如果尚未使用该信息),一切都是有意义的.我将更具体地使用我使用左侧绘图和右侧代码制作的设置中的两个屏幕截图:
以下数据样本按照上面描述的逻辑生成下图:
数据样本 3
data = [['Source','Target','Value','Color','Node, Label','Link Color'],
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
[1,5,14,'#7FC241','Leave+Yes – 14','rgba(219, 233, 246, 0.5)'],
[1,6,1,'#D3D3D3','Didn’t vote in at least one referendum – 21','rgba(73, 148, 206, 1)']]
屏幕截图 1 - 带有数据样本 3 的部分图
问题:
在数据集中添加行 [1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)'] 会在源 [5] 和目标 [7] 之间生成一个新链接,但同时将颜色和标签应用于目标 5。我认为要应用于图表的下一个标签是'Remain+Yes – 21',因为它尚未使用。但这里发生的情况是标签 '46 – No' 应用于目标 5。为什么?
屏幕截图 2 - 带有数据样本 3 的部分图 + [1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)'] :
您如何根据该数据框辨别什么是源,什么是目标?
我知道这个问题既奇怪又难以回答,但我希望有人能提出建议。我也知道数据框可能不是 sankey 图表的最佳来源。也许用 json 代替?
为 Jupyter Notebook 轻松复制和粘贴的完整代码和数据示例:
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
# Original data
data = [['Source','Target','Value','Color','Node, Label','Link Color'],
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
[1,5,14,'#7FC241','Leave+Yes – 14','rgba(219, 233, 246, 0.5)'],
[1,6,1,'#D3D3D3','Didn’t vote in at least one referendum – 21','rgba(73, 148, 206, 1)'],
[1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)'],
[2,5,3,'#449E9E','39 – Yes','rgba(250, 188, 19, 1)'],
[2,6,17,'#D3D3D3','14 – Don’t know / would not vote','rgba(250, 188, 19, 0.5)'],
[2,7,2,'','','rgba(250, 188, 19, 0.5)'],
[3,5,3,'','','rgba(127, 194, 65, 1)'],
[3,6,9,'','','rgba(127, 194, 65, 0.5)'],
[3,7,2,'','','rgba(127, 194, 65, 0.5)'],
[4,5,5,'','','rgba(211, 211, 211, 0.5)'],
[4,6,9,'','','rgba(211, 211, 211, 0.5)'],
[4,7,8,'','','rgba(211, 211, 211, 0.5)']
]
headers = data.pop(0)
df = pd.DataFrame(data, columns = headers)
scottish_df = df
data_trace = dict(
type='sankey',
domain = dict(
x = [0,1],
y = [0,1]
),
orientation = "h",
valueformat = ".0f",
node = dict(
pad = 10,
thickness = 30,
line = dict(
color = "black",
width = 0
),
label = scottish_df['Node, Label'].dropna(axis=0, how='any'),
color = scottish_df['Color']
),
link = dict(
source = scottish_df['Source'].dropna(axis=0, how='any'),
target = scottish_df['Target'].dropna(axis=0, how='any'),
value = scottish_df['Value'].dropna(axis=0, how='any'),
color = scottish_df['Link Color'].dropna(axis=0, how='any'),
)
)
layout = dict(
title = "Scottish Referendum Voters who now want Independence",
height = 772,
font = dict(
size = 10
),
)
fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)
【问题讨论】:
标签: python pandas jupyter-notebook plotly sankey-diagram