【发布时间】:2021-05-20 11:43:31
【问题描述】:
下面的代码是我编写的,用于使用具有 3 个参数的数据集中的 k-Means 预测可能的疾病,这是正确的吗? 但这并没有给出我想要的准确结果。
import pandas as pd #importing library for reading dataset
from sklearn.cluster import KMeans #using ML library in python for
utilizing kmeans
##reading the dataset from csv file and storing in variable called data..
data = pd.read_csv(r"C:\Users\Hassan Tariq\Disease
Prediction\DataSet.csv")
##selecting data cols from dataset.
X_Data = data.iloc[:,[1]] #first col as a part of first variable
Y_Data = data.iloc[:,[2,3]] ##second col as a part of second variable
##i have used two cols in second variable because we cannot train kmeans
on three parameters.
#initializing the model with 3 initial clusters.
model1 = KMeans(n_clusters=3, random_state=3)
#training model on the selected data..
prediction = model1.fit_predict(X_Data,Y_Data)
#printing the clusters prediction from the model.
print("Clustered Dataset: \n",prediction)
#printing the centroids which shows the data behavior in each cluster
print("Centroids of the clusters formed: \n",model1.cluster_centers_)
centeroids_collection = model1.cluster_centers_
#specifying the diseases which can be possible.
disease1 = ['Muscle Twitching','Nausea']
disease2 = ['Eye Irritation', 'Lung Irritation']
disease3 = ['Eye Irritation','Diarrhea']
#loop for iterating all the data in the dataset to predict the disease..
【问题讨论】:
-
K-Means 是一种无监督学习算法,因此这里没有“y”(
fit_predict接受它只是为了在 API 中保持一致性;它是 ignored)。恕我直言,您应该首先更彻底地研究 K-Means。 -
那么我应该使用监督学习算法吗?
-
我的意思是,我的数据集中有 3 个参数 Ph、浊度和 Tds,现在我想做的是,如果 ph>11 tds 是 200-300 并且浊度是 400-500(某些疾病应该预测)这就是我真正想要开发的。
-
如果没有有关您的要求和此处涉及的数据集的其他信息,您的问题将无法回答,无论采用何种方法..
标签: python machine-learning scikit-learn k-means