要实现的线性多类分类器:
pred = w1 * (x1 + x2 + x3) + w2 * (x4 + x5 + x6) + w3 * (x7 + x8 + x9)
其中所有变量都是标量。
在此模型中,由于pred 是一个标量,因此您不能使用交叉熵损失来训练分类器(pred 不是分布)。 您必须将其视为回归问题。
示例数据集
import numpy as np
x1 = np.ones((100, 3)) # for w1
x2 = np.ones((100, 3)) * 2 # for w2
x3 = np.ones((100, 3)) * 3 # for w3
# set(y) is {0, 1, 2, 3}, corresponds to the four class labels
y = np.random.randint(0, 4, 100).reshape(-1, 1)
例如tensorflow代码:
import tensorflow as tf
tf.reset_default_graph()
f1 = tf.placeholder('float32', shape=[None, 3], name='f1')
f2 = tf.placeholder('float32', shape=[None, 3], name='f2')
f3 = tf.placeholder('float32', shape=[None, 3], name='f3')
target = tf.placeholder('float32', shape=[None, 1], name='target')
# the three scalars
w1 = tf.get_variable('w1', shape=[1], initializer=tf.random_normal_initializer())
w2 = tf.get_variable('w2', shape=[1], initializer=tf.random_normal_initializer())
w3 = tf.get_variable('w3', shape=[1], initializer=tf.random_normal_initializer())
pred_1 = tf.reduce_sum(tf.multiply(f1, w1), axis=1)
pred_2 = tf.reduce_sum(tf.multiply(f2, w2), axis=1)
pred_3 = tf.reduce_sum(tf.multiply(f3, w3), axis=1)
# till now the linear classifier has been constructed
# pred = w1(x1 + x2 + x3) + w2(x4 + x5 + x6) + w3(x7 + x8 + x9)
pred = tf.add_n([pred_1, pred_2, pred_3])
# treat it as a regression problem
loss = tf.reduce_mean(tf.square(pred - target))
optimizer = tf.train.GradientDescentOptimizer(1e-5)
updates = optimizer.minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t in range(50):
loss_val, _ = sess.run([loss, updates],
feed_dict={f1: x1, f2: x2, f3: x3, target: y})
print(t, loss_val)
下面是一个使用交叉熵损失来训练多类分类器的简单示例。如你所见,这个模型是一个神经网络模型
import numpy as np
import tensorflow as tf
x1 = np.ones((100, 3)) # for w1
x2 = np.ones((100, 3)) * 2 # for w2
x3 = np.ones((100, 3)) * 3 # for w3
y = np.random.randint(0, 4, 400).reshape(100, 4)
tf.reset_default_graph()
f1 = tf.placeholder('float32', shape=[None, 3], name='f1')
f2 = tf.placeholder('float32', shape=[None, 3], name='f2')
f3 = tf.placeholder('float32', shape=[None, 3], name='f3')
target = tf.placeholder('float32', shape=[None, 4], name='target')
# the three scalars
w1 = tf.get_variable('w1', shape=[1], initializer=tf.random_normal_initializer())
w2 = tf.get_variable('w2', shape=[1], initializer=tf.random_normal_initializer())
w3 = tf.get_variable('w3', shape=[1], initializer=tf.random_normal_initializer())
w = tf.get_variable('w', shape=[3, 4], initializer=tf.random_normal_initializer())
pred_1 = tf.reduce_sum(tf.multiply(f1, w1), axis=1)
pred_2 = tf.reduce_sum(tf.multiply(f2, w2), axis=1)
pred_3 = tf.reduce_sum(tf.multiply(f3, w3), axis=1)
pred = tf.stack([pred_1, pred_2, pred_3], axis=1)
pred = tf.matmul(pred, w)
loss = tf.losses.softmax_cross_entropy(onehot_labels=target, logits=pred)
optimizer = tf.train.GradientDescentOptimizer(1e-5)
updates = optimizer.minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t in range(50):
loss_val, _ = sess.run([loss, updates],
feed_dict={f1: x1, f2: x2, f3: x3, target: y})
print(t, loss_val)