The hierarcy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. --> deep learning
A computer can reason about statements in these formal languages automatically using logical inference rules. --> knowledge base
AI system need the ability to acquire their own knowledge, by extracting patterns from raw data. --> machine learning --> representation of the data ++(feature)++have an enormous effect on the performance of ML
eg1.logistic regression
eg2.naive Bayes
representation learning(by ML) --> separate the factors of variation that explain the observed data --> solved by DL
eg1.autoencoder: the combination of an encoder function (converts the input data into a different representation) and a decoder function (converts the new representation back into the orginal format)
Deep learning
eg1.feedforward deep network
eg2.multilayer perceptron
two perspective:
learning the right representation for data
depth allows the computer to learn a multi-step computer program
measuring the depth of a model
the number of sequential instructions
not the depth of the computational graph but the depth of the graph(usually used in deep probabilistic)
summerise
Challenges
how to get informal knowledge(knowledge about world) into a computer
many of the factors of variantion influence every single piece of data we observe
Organize of the book
Historical Trends in Deep Learning
The 1940s, Deep learning appare to be new.
Known as cybernetics in the 1940s-1960s.
Known as connectionism in the 1980s-1990s.
Known as Deep learning in 2006.
The neural perspective on DL:
the brain provides a proof by eaxmple that intelligent behavior is possible, and a conceptually straightforward path to building intelligence is to reverse engineer the computational principles behind the brain and duplicate its functionality.
it would be deeply interesting to understand the brain and the principles that underlie human intelligence.
a more general principle of learning multiple levels of composition.
the earliest predecessors is simple linear models motivated from a neuroscientific perspective
hand-controlled weight for classifer
In the 1950s, the perceptron became the first model that could learn the weights defining the categories given examples of inputs from each category.
adaptive linear element (ADALINE) ++ proposed the same time ++
the training algorithm for ADALINE is stochastic gradient descent (SGD)
perceptron and ADALINE are linear models.Cannot learn XOR function
Diminished role of neuroscience --> we cannot have enough information about the brain to use it as a guide.
Neocognitron (1980) is the basis of mordern convolutional network (1998).most NN based on a model neuron called the rectified linear unit.
Cognitron (1975)
viewpoint
Nair and Hinton (2010) and Glorot (2011a) --> neuroscience
Jarrett (2009) --> engineering-oriented
connectionism or parallel distributed processing (1986 and 1995)
the central idea : a large number of simple computational units can achieve intelligent behavior when networked together.
distributed representation (1986)
successful use of back-propagation to train deep neural network with internal representations and the popularization of the back-propagation algorithm (1986a and 1987)
some of fundamental mathematical difficulties in modeling long sequences are identified (1991 and 1994)
the long short-term memory or LSTM network to solve above difficulties (1997)
Kernel machines (1992,1995 and 1999) and Graphical models (1998) become popluar
In 1998b and 2001, Canadian Institute for Advanced Research (CIFAR) keep NN research alive.
In 2006, Deep Belief Network can be trained using a strategy called greedy layer-wise pretraining.
greedy layer-wise pretraining is used to train many kinds of deep network (2007)
deep learning forcus the depth (2007,2011,2014a and 2014)
Increasing Dataset Sizes
1950s, first experiment of ANN conducted; 1990s, used in commerical applications
Increasing Model Sizes
Increasing Accuracy, Complexity and Real-World Impact
1986a, earlist deep models for individual objects in tightly cropped, extremely small images.
2012, modern object recognition networks with high-resolution photographs and uncropped photos.–>error from 26.1% to 15.3% --> down to 3.6%
2010,2010b,2011 and 2012a, error rate of peech recongnition have a sudden drop with DL
2013, DL have successes for pedestrian detection and image segmentation
2012, DL have superhuman performance in traffic sign classification.
2014d,NN can output an entire sequence of characters transcribed from an image.
2013, need labeling of the individual elements of the sequence.