【问题标题】:Cluster unseen points using Spectral Clustering使用光谱聚类对看不见的点进行聚类
【发布时间】:2015-09-13 11:06:24
【问题描述】:

我正在使用Spectral Clustering 方法对我的数据进行聚类。该实施似乎工作正常。但是,我有一个问题 - 我有一组看不见的点(不存在于训练集中),并希望根据 k-means 得出的质心对这些点进行聚类(论文中的第 5 步)。但是,k-means 是在 k 个特征向量上计算的,因此质心是低维的。

有谁知道一种方法,可用于将看不见的点映射到低维并计算投影点与第 5 步中通过 k-means 得出的质心之间的距离。

【问题讨论】:

    标签: cluster-analysis k-means


    【解决方案1】:

    使用与第 5 步中的聚类方法相同的分配规则来分配新数据点。例如,k-means 使用一些距离度量 d 将原始学习数据集中的数据点分配给某个集群。只需使用相同的指标将看不见的点分配给您的最终集群之一。所以,添加一个新的步骤 7。

    1. 如果 d(P, xj) 是最小值,则将一个看不见的点 P 分配给集群 j在所有可能的集群上xi

    【讨论】:

      【解决方案2】:

      迟到的答案,但这是R 中的方法。我自己一直在寻找它,但我终于设法自己编写了代码。

      ##Let's use kernlab for all kernel stuff
      library(kernlab)
      
      ##Let's generate two concentric circles to cluster
      r1 = 1 + .1*rnorm(250) #inner
      r2 = 2 + .1*rnorm(250) #outer
      q1 = 2*pi*runif(500) #random angle distribution
      q2 = 2*pi*runif(500) #random angle distribution
      
      ##This is our data now
      data = cbind(x = c(r1*cos(q1),r2*cos(q2)), y = c(r1*sin(q1),r2*sin(q2)))
      
      ##Let's take a sample to define train and test data
      t = sample(1:nrow(data), 0.95*nrow(data))
      train = data[t,]
      test = data[-t,]
      
      ##This is our data
      plot(train, pch = 1, col = adjustcolor("black", alpha = .5))
      points(test, pch = 16)
      legend("topleft", c("train data","test data"), pch = c(1,16), bg = "white")
      
      
      ##The paper gives great instructions on how to perform spectral clustering
      ##so I'll be following the steps
      ##Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2, 849-856.
      ##Pg.2 http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
      #1. Form the affinity matrix
      k = 2L #This is the number ofo clusters we will train
      K = rbfdot(sigma = 300) #Our kernel
      A = kernelMatrix(K, train) #Caution choosing your kernel product function, some have higher numerical imprecision
      diag(A) = 0
      #2. Define the diagonal matrix D and the laplacean matrix L
      D = diag(rowSums(A))
      L = diag(1/sqrt(diag(D))) %*% A %*% diag(1/sqrt(diag(D)))
      #3. Find the eigenvectors of L
      X = eigen(L, symmetric = TRUE)$vectors[,1:k]
      #4. Form Y from X
      Y = X/sqrt(rowSums(X^2))
      #5. Cluster (k-means)
      kM = kmeans(Y, centers = k, iter.max = 100L, nstart = 1000L)
      #6. This is the cluster assignment of the original data
      cl = fitted(kM, "classes")
      ##Projection on eigen vectors, see the ranges, it shows how there's a single preferential direction
      plot(jitter(Y, .1), ylab = "2nd eigenfunction", xlab = "1st eigenfunction", col = adjustcolor(rainbow(3)[2*cl-1], alpha = .5))
      
      ##LET'S TRY TEST DATA NOW
      B = kernelMatrix(K, test, train) #The kernel product between train and test data
      
      ##We project on the learned eigenfunctions
      f = tcrossprod(B, t(Y))
      #This part is described in Bengio, Y., Vincent, P., Paiement, J. F., Delalleau, O., Ouimet, M., & Le Roux, N. (2003). Spectral clustering and kernel PCA are learning eigenfunctions (Vol. 1239). CIRANO.
      #Pg.12 http://www.cirano.qc.ca/pdf/publication/2003s-19.pdf
      
      ##And assign clusters based on the centers in that space
      new.cl = apply(as.matrix(f), 1, function(x) { which.max(tcrossprod(x,kM$centers)) } ) #This computes the distance to the k-means centers on the transformed space
      
      ##And here's our result
      plot(train, pch = 1, col = adjustcolor(rainbow(3)[2*cl-1], alpha = .5))
      points(test, pch = 16, col = rainbow(3)[2*new.cl-1])
      legend("topleft", c("train data","test data"), pch = c(1,16), bg = "white")
      

      输出图片

      【讨论】:

      • 太棒了,感谢您的发帖,我正在寻找这个!
      猜你喜欢
      • 2018-02-25
      • 2017-11-30
      • 2021-02-18
      • 2013-03-07
      • 2016-08-07
      • 2016-11-19
      • 2016-06-03
      • 2014-11-01
      • 2016-04-24
      相关资源
      最近更新 更多