【发布时间】:2018-09-17 14:11:52
【问题描述】:
我正在写 DBSCAN,但遇到了一些奇怪的问题。(2 个问题)
这是我的代码:
第一部分有问题,如果我添加X = StandardScaler().fit_transform(X) 结果坐标是错误的!但是如果我没有添加这段代码,它总是会是一个聚类(但结果的坐标是正确的!)。我尝试调整 esp 或 min_samples,但没有改变。
dataSet = []
fileIn = open('data')
for line in fileIn.readlines():
lineArr = line.strip().split('\t')
dataSet.append([float(lineArr[0]), float(lineArr[1])])
numSamples = len(dataSet)
X = np.array(dataSet)
X = StandardScaler().fit_transform(X)
(添加) (无)
db = DBSCAN(eps=0.5, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
print(labels)
counters = {}
for item in labels:
if item in counters:
counters[item] += 1
else:
counters[item] = 1
print ("Count of different cluster:(#r,g,b,a)")#r,g,b,a
print (counters)
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)
第二个问题是我尝试绘制我计算的坐标,但我不知道为什么它显示结果如此错误!
clusters = [np.mean(X[labels == i],axis=0) for i in range(n_clusters_)]
outliers = X[labels == 0]
print(clusters)
for i in range(n_clusters_):
plt.plot(clusters[i],'*',markersize=20)
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
xy = X[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),markeredgecolor='k', markersize=14)
xy = X[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),markeredgecolor='k', markersize=6)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
请帮我谢谢你!
【问题讨论】:
-
你的点是什么坐标系?
-
你认出罗子涵了吗?