【发布时间】:2018-07-21 10:32:20
【问题描述】:
我有以下数据,对于每一列,带有数字的行是输入,字母是输出。
A,A,A,B,B,B
-0.979090189,0.338819904,-0.253746508,0.213454999,-0.580601104,-0.441683968
-0.48395313,0.436456904,-1.427424032,-0.107093825,0.320813402,0.060866105
-1.098818173,-0.999161692,-1.371721698,-1.057324962,-1.161752652,-0.854872591
-1.53191442,-1.465454248,-1.350414216,-1.732518018,-1.674040715,-1.561568496
2.522796162,2.498153298,3.11756171,2.125738509,3.003929536,2.514411247
-0.060161596,-0.487513844,-1.083513761,-0.908023322,-1.047536921,-0.48276759
0.241962669,0.181365373,0.174042637,-0.048013217,-0.177434916,0.42738621
-0.603856395,-1.020531402,-1.091134021,-0.863008165,-0.683233589,-0.849059931
-0.626159165,-0.348144322,-0.518640038,-0.394482485,-0.249935646,-0.543947259
-1.407263942,-1.387660115,-1.612988118,-1.141282747,-0.944745366,-1.030944216
-0.682567673,-0.043613473,-0.105679403,0.135431139,0.059104888,-0.132060832
-1.10107164,-1.030047313,-1.239075022,-0.651818656,-1.043589073,-0.765992541
我正在尝试执行 KNN LOOCV 以获得准确度分数和混淆矩阵。
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import LeaveOneOut
import pandas as pd
def main():
csv = 'data.csv'
df = pd.read_csv(csv)
X = df.values.T
y = df.columns.values
clf = KNeighborsClassifier()
loo = LeaveOneOut()
for train_index, test_index in loo.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf.fit(X_train, y_train)
y_true = y_test
y_pred = clf.predict(X_test)
ac = accuracy_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)
print ac
print cm
if __name__ == '__main__':
main()
但是我的结果都是 0。我哪里错了?
【问题讨论】:
-
你可能想阅读this
-
我认为第一个问题是 pandas 会破坏重复项,并且关闭它的选项不起作用......所以
A A变成了A A.1。我认为第二个问题是y_true和y_pred应该是训练中所有值的列表,而不是单个值。 -
我不确定您为什么要转置数据以便它可以匹配列标签。 y 应该是数据的最后一列,并且该列的每一行都应该有标签 A 或 B。
标签: python pandas machine-learning scikit-learn cross-validation