【问题标题】:Cluster centres with wrong dimentions in skfuzzy C mean clusteringskfuzzy C中尺寸错误的聚类中心意味着聚类
【发布时间】:2018-10-07 22:51:51
【问题描述】:

您好,我在下面编写了简单的代码来探索模糊 Cmean 聚类

import pandas as pd
import numpy as np
from os import listdir
from sklearn.model_selection import train_test_split
from skfuzzy.cluster import cmeans, cmeans_predict
from sklearn.metrics import classification_report,confusion_matrix

def find_csv_filenames( path_to_dir, suffix=".csv" ):
    filenames = listdir(path_to_dir)
    return [ path_to_dir+filename for filename in filenames if filename.endswith( suffix ) ]

listFiles = find_csv_filenames('<Path to folder with csv files>')
for files in listFiles:
    df = pd.read_csv(files)
    df.loc[df['bug']>1,'bug']=1
    df2 =df.iloc[:,3:]
    #Above are some pre processing steps
    #Below splitting data for test and train
    X_train, X_test = train_test_split(df2, test_size=0.30)
    #dropping bug column for unsupervised learning
    X_train2 = X_train.drop('bug',axis=1) 
    X_test2  = X_test.drop('bug',axis=1) 
    print (X_train2.shape)
    #Shape is 163,20 for 163 training data with 20 features
    cntr, u, u0, d, jm, p, fpc = cmeans(X_train2,2,2,0.25,500,init=None, seed=None)
    print(cntr.shape)
    #above shape is coming 2,163

来自上述 cmeam 算法的中心的大小为 (2,163) 但由于我的训练数据只有 20 个特征,因此 cntr 的形状应该是(2,20)。无法理解我错在哪里

【问题讨论】:

    标签: python-3.x scikit-learn cluster-analysis fuzzy-c-means


    【解决方案1】:

    来自skfuzzy 文档:

    数据:二维数组,大小(S,N)

    要聚类的数据。 N是数据集的数量; S 是每个样本向量内的特征数。

    所以你需要转置你的输入,但没有经过测试:

    cmeans(X_train2.T, ...)
    

    应该可以。

    【讨论】:

      猜你喜欢
      • 2020-08-17
      • 2013-05-17
      • 2020-09-06
      • 2019-04-23
      • 2018-01-22
      • 1970-01-01
      • 1970-01-01
      • 2021-02-14
      • 2016-02-02
      相关资源
      最近更新 更多