【发布时间】:2017-08-20 21:42:08
【问题描述】:
我已经开始在 python tensorflow 库上使用 K-Nearest-Neighbors 方法进行机器学习项目。我没有使用 tensorflow 工具的经验,所以我在 github 中找到了一些代码并针对我的数据进行了修改。
我的数据集是这样的:
2,2,2,2,0,0,3
2,2,2,2,0,1,0
2,2,2,4,2,2,1
...
2,2,2,4,2,0,0
这是实际运行良好的代码:
import tensorflow as tf
import numpy as np
# Whole dataset => 1428 samples
dataset = 'car-eval-data-1.csv'
# samples for train, remaining for test
samples = 1300
reader = np.loadtxt(open(dataset, "rb"), delimiter=",", skiprows=1, dtype=np.int32)
train_x, train_y = reader[:samples,:5], reader[:samples,6]
test_x, test_y = reader[samples:, :5], reader[samples:, 6]
# Placeholder you can assign values in future. its kind of a variable
# v = ("variable type",[None,4]) -- you can have multidimensional values here
training_values = tf.placeholder("float",[None,len(train_x[0])])
test_values = tf.placeholder("float",[len(train_x[0])])
# MANHATTAN distance
distance = tf.abs(tf.reduce_sum(tf.square(tf.subtract(training_values,test_values)),reduction_indices=1))
prediction = tf.arg_min(distance, 0)
init = tf.global_variables_initializer()
accuracy = 0.0
with tf.Session() as sess:
sess.run(init)
# Looping through the test set to compare against the training set
for i in range (len(test_x)):
# Tensor flow method to get the prediction near to the test parameters in the training set.
index_in_trainingset = sess.run(prediction, feed_dict={training_values:train_x,test_values:test_x[i]})
print("Test %d, and the prediction is %s, the real value is %s"%(i,train_y[index_in_trainingset],test_y[i]))
if train_y[index_in_trainingset] == test_y[i]:
# if prediction is right so accuracy increases.
accuracy += 1. / len(test_x)
print('Accuracy -> ', accuracy * 100, ' %')
我唯一不明白的是,如果是 KNN 方法,那么必须有一些 K 参数 来定义 用于预测的邻居数每个测试样本的标签。
我们如何分配 K 参数来调整代码的最近邻数?
有没有办法修改这段代码以使用K参数?
【问题讨论】:
-
我还添加了我用于此代码的数据集的链接以供公众使用:uploadkadeh.com/5u3zc3jdqqei
标签: machine-learning tensorflow python-3.6 knn