【发布时间】:2020-03-18 22:04:45
【问题描述】:
我必须将以下内容应用于我的数据集 DF 的列 聚类算法 https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html 我能怎么做?谢谢
我编写了这段代码,但是当我发送它与完整的数据集一起运行时,我会输出“MEMORYERROR”
from sklearn.cluster import DBSCAN
from sklearn import metrics
import sklearn.utils
from sklearn.preprocessing import StandardScaler
#sklearn.utils.check_random_state(1000)
Clus_dataSet = df[['pickup_dt','pickup_lat', 'pickup_lon']]
Clus_dataSet = numpy.nan_to_num(Clus_dataSet)
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)
# Compute DBSCAN
db = DBSCAN(eps=2, min_samples=2, metric='euclidean').fit(Clus_dataSet)
core_samples_mask = numpy.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print('Estimated number of clusters: %d' % n_clusters_)
print('Estimated number of noise points: %d' % n_noise_)
# Plot result
import matplotlib.pyplot as plt
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in numpy.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
xy = Clus_dataSet[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=14)
xy = Clus_dataSet[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
【问题讨论】:
-
你试过什么?如果您遇到困难,我们会提供帮助。给我们看一些代码。
-
我输入了密码
标签: python-3.x dbscan