【发布时间】:2020-05-09 19:25:27
【问题描述】:
我正在以下数据集上拟合决策树:
https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data
以下是我的代码:
balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
sep= ',', header= None)
le = preprocessing.LabelEncoder()
balance_data = balance_data.apply(le.fit_transform)
X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)
#using Gini index
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
#using Information Gain
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
#Gini prediction
y_pred = clf_gini.predict(X_test)
y_pred
#IG prediction
y_pred_en = clf_entropy.predict(X_test)
y_pred_en
在基尼指数和 IG 两种情况下,输出如下:
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,])
训练有问题吗?此外,如何将此数值转换为字符串值。
Edit1:我计算了准确度,结果显示为 71。是否有可能唯一的问题在于输出的显示?
【问题讨论】:
标签: python machine-learning scikit-learn