作者
香港科技大学
- Quan Li
- Kristanto Sean Njotoprawiro
- Hammad Haleem
- Xiaojuan Ma
腾讯微信
- Qiaoan Chen
- Chris Yi
摘要
通过嵌入模型为网络中的节点构造潜在向量已经在许多图分析应用程序中表明了其实用性,例如节点分类、聚类和链接预测。然而,尽管嵌入学习模型的效率和准确性高,但是人们几乎不知道在嵌入向量中保留了有关原始网络的哪些信息。低维向量表示的抽象性,构造过程的随机性以及非透明的超参数都模糊了对网络嵌入结果的理解。通常通过引入可视化技术将嵌入空间投影到二维平面中来进行检查。尽管现有的可视化方法允许简单地检查嵌入空间的结构,但是它不支持对嵌入向量的深入探索。在本文中,我们设计了一个探索式的可视分析系统,该系统支持在聚类、实例和结构级别对嵌入向量进行比较性视觉解释。更具体地说,它有助于比较在不同嵌入模型中保留哪些节点度量和如何保留节点度量,以及便于调查节点度量与所选嵌入向量之间的关系。几个案例研究证实了我们系统的有效性。专家的反馈表明,我们的方法确实有助于他们更好地理解网络嵌入模型。
Introduction
网络嵌入技术分类:
- matrix factorization-based methods
- deep learning (DL)-based methods with or without random walk
- edge reconstruction-based methods, including maximizing edge reconstruct probability or minimizing distance-based loss or margin-based ranking loss
- graph kernel-based methods
- generative model-based methods that incorporate latent semantics
当前方法的局限性:
- Abstract Representation
- Inefficient Exploration
- Shallow-Level Analysis
本文贡献:
- 识别节点属性在图空间和嵌入空间的相关性,并提出了“average distance vector”来描述结构特征
- 开发了恰当的交互式可视化功能,支持细粒度的分析
- 展示了与ML从业者一起工作的经验,并通过几个案例来验证我们系统的有效性
Related work
- Evaluation of Embedding Models
- 将embedding应用到不同的ML任务中
- Explanation of Embedding Vector Space
- 降维
- Comparative Visualization
- juxtaposition (side-by-side)
- superposition
- explicit encoding (visual display of differences or correlations)
Background and observational study
About Network Embedding
- DeepWalk
- node2vec
- struc2vec
Experts’ Conventional Practice and Bottlenecks
- social influence
- pairwise features (共同邻居、路径长度)
- structural features (社团个数、连通分量)
- issues
- Understanding Obstacles
- Inconvenient Comparison
- Limited Analysis (没有衡量节点相似性的普遍方法)
Experts’ Needs and Expectations of Embedding
- Identifying node metrics preserved by different embedding models or by versions with different hyper-parameters
- Analyzing the capability of embedding vectors to retain assorted structural characteristics
- Exploring cluster geometry of nodes in embedding space
- Exploring the neighbors of a particular node
- Analyzing pairwise node similarity
- Highlighting nodes simultaneously in different spaces
EmbeddingVis
Regression-based Pairwise Node Metric Analysis
回归分析
节点属性
- Degree
- Eccentricity
- Closeness
- Betweenness
- Eigenvector
- PageRank
- Clustering Coefficient
- Average Nearest Neighbors Degree (knn)
- Within Module Degree
- Participation Coefficient
- Leverage Centrality
Identifying preserved node metrics
比较两两节点之间在嵌入空间和属性空间的相关性以确定不同的模型保留了哪些和如何保留节点属性。
Dataset
- csphd
- citeseer
- wiki
Parameter settings
- window size 10
- embedding dimension 128
- number of random walks for each node 10
- walk length 80
- node2vec p, q: .004, 1, 256 (9组)
Experiments
回归模型:
- Decision Tree Regression
- LinearRegression
- BayesRidge
- Lasso
Structural Feature Measurement
metrics
- number of alters of the focal node (degree)
- number of edges among neighbors (edges num)
- density of the neighborhood (density)
- number of focal node’s 2-step neighborhood (twoalter num)
- average number of neighbors of neighbors’ networks (average alter alter num)
- average degree of neighborhood (average degree)
- clustering coefficient
原理:
a graph’s structure is decided by node position, which can be described by the distribution of distances between the links’ nodes
对 ‘focal node’-‘neighborhood’ 的结构通过kmeans进行聚类,使用的是Canberra Distance
接着得到聚类的平均向量距离。
Visualization
overview first, zoom and filter, then details-on-demand
Cluster-Level as Cluster Transition View:
- parallel coordinate plot
- t-SNE embedded transition diagram
Instance-Level as Pairwise Ranking View
- Ranking Measurement
- Discounted Cumulative Gain (DCG)
- Design Criteria
- Encode Ranking Similarity
- Encode Ranking Causes
- Compare Rankings Between Models
- Interactive Operations
- Visual Encoding and Interaction
Structure-Level as Structural View
- Visual Encoding and Interaction
- ‘focal node’-‘neighborhood’ structure
Interactions Among the Views
- Filtering and Highlighting
- Linking
- Animation
思考
Critical Thinking:
数据集的不同和质量的好快对结果的影响比较大
Creative Thinking:
可以可视化模型的参数和迭代次数与嵌入的关系
How to apply it to our work:
界面的设计展示可以参考