CS231n- 第二讲图像分类

What

图像分类为计算机视觉领域的核心问题之一，并且有着各种各样的实际应用。

以下图为例，图像分类模型读取该图片，并生成该图片属于集合 {cat, dog, hat, mug}中各个标签的概率。

需注意的是，计算机中的RGB图像存储为一个由数字构成的三维数组，0表示全黑，255表示全白。

CS231n- 第二讲图像分类

因此，图像分类的任务可表述为：对于一个给定的图像，预测它属于的那个分类标签（或者给出属于一系列不同标签的可能性）

Challenges

视角变化（Viewpoint variation）：同一个物体，摄像机可以从多个角度来展现。
光照条件（Illumination conditions）：在像素层面上，光照的影响非常大。
形变（Deformation）：很多东西的形状并非一成不变，会有很大变化。
大小变化（Scale variation）：物体可视的大小通常是会变化的（不仅是在图片中，在真实世界中大小也是变化的）。
遮挡（Occlusion）：目标物体可能被挡住。有时候有物体的一小部分（可以小到几个像素）是可见的。
背景干扰（Background clutter）：物体可能混入背景之中，使之难以被辨认。
类内差异（Intra-class variation）：一类物体的个体之间的外形差异很大，比如椅子。这一类物体有许多不同的对象，每个都有自己的外形。

因此，面对以上所有变化及其组合，好的图像分类模型能够在维持分类结论稳定的同时，保持对类间差异足够敏感。

CS231n- 第二讲图像分类

HOW

1. 数据驱动方法（Data-Driven Approach）

收集图片和标签集
使用机器学习训练一个分类器
在新的图片上进行测试

2. Nearest Neighbor 分类器

假设现在我们有CIFAR-10的50000张图片（每种分类5000张）作为训练集，我们希望将余下的10000作为测试集并给他们打上标签。

Nearest Neighbor算法将会拿着测试图片和训练集中每一张图片去比较，然后将它认为最相似的那个训练集图片的标签赋给这张测试图片。

那么具体如何比较两张图片呢？最简单的方法就是逐个像素比较，最后将差异值全部加起来。换句话说，就是将两张图片先转化为两个向量I_1和$I_2$，然后计算他们的L1距离：$$d_1(I_1,I_2)=\sum_{p}|I_1^p-I_2^p|$$

优点：易于理解，实现简单；算法的训练过程只需要将训练集数据存储起来，训练耗费时间短
缺点：测试要花费大量时间计算，因为每个测试图像都需要和所有存储的训练图像进行比较；而卷积神经网络虽然训练花费很多时间，但是一旦训练完成，对新的测试数据进行分类非常快。这样的模式就符合实际使用需求。

代码实现为：

 1 # 将CIFAR-10的数据加载到内存中，并分成4个数组：训练数据和标签，测试数据和标签
  2 Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
  3 # flatten out all images to be one-dimensional
  4 Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
  5 Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072
  6
  7 nn = NearestNeighbor() # create a Nearest Neighbor classifier class
  8 nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
  9 Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
 10 # and now print the classification accuracy, which is the average number
 11 # of examples that are correctly predicted (i.e. label matches)
 12 print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )
 13
 14 import numpy as np
 15
 16 class NearestNeighbor(object):
 17   def __init__(self):
 18     pass
 19
 20   def train(self, X, y):
 21     """ X is N x D where each row is an example. Y is 1-dimension of size N """
 22     # the nearest neighbor classifier simply remembers all the training data
 23     self.Xtr = X
 24     self.ytr = y
 25
 26   def predict(self, X):
 27     """ X is N x D where each row is an example we wish to predict label for """
 28     num_test = X.shape[0]
 29     # lets make sure that the output type matches the input type
 30     Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
 31
 32     # loop over all test rows
 33     for i in xrange(num_test):
 34       # find the nearest training image to the i'th test image
 35       # using the L1 distance (sum of absolute value differences)
 36       distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
 37       min_index = np.argmin(distances) # get the index with smallest distance
 38       Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
 39
 40     return Ypred

View Code