C1-Introduction - 爱码网

Some Concepts

The hierarcy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. --> deep learning
A computer can reason about statements in these formal languages automatically using logical inference rules. --> knowledge base
AI system need the ability to acquire their own knowledge, by extracting patterns from raw data. --> machine learning --> representation of the data ++(feature)++have an enormous effect on the performance of ML
- eg1.logistic regression
- eg2.naive Bayes
- representation learning(by ML) --> separate the factors of variation that explain the observed data --> solved by DL
  - eg1.autoencoder: the combination of an encoder function (converts the input data into a different representation) and a decoder function (converts the new representation back into the orginal format)
Deep learning
- eg1.feedforward deep network
- eg2.multilayer perceptron
- two perspective:
  - learning the right representation for data
  - depth allows the computer to learn a multi-step computer program
  - measuring the depth of a model
    - the number of sequential instructions
    - not the depth of the computational graph but the depth of the graph(usually used in deep probabilistic)

summerise

C1-Introduction

Challenges

how to get informal knowledge(knowledge about world) into a computer
many of the factors of variantion influence every single piece of data we observe

Organize of the book

C1-Introduction

Historical Trends in Deep Learning

The 1940s, Deep learning appare to be new.
Known as cybernetics in the 1940s-1960s.
Known as connectionism in the 1980s-1990s.
Known as Deep learning in 2006.
The neural perspective on DL:
- the brain provides a proof by eaxmple that intelligent behavior is possible, and a conceptually straightforward path to building intelligence is to reverse engineer the computational principles behind the brain and duplicate its functionality.
- it would be deeply interesting to understand the brain and the principles that underlie human intelligence.
a more general principle of learning multiple levels of composition.
the earliest predecessors is simple linear models motivated from a neuroscientific perspective
hand-controlled weight for classifer
In the 1950s, the perceptron became the first model that could learn the weights defining the categories given examples of inputs from each category.
adaptive linear element (ADALINE) ++ proposed the same time ++
the training algorithm for ADALINE is stochastic gradient descent (SGD)
perceptron and ADALINE are linear models.Cannot learn XOR function
Diminished role of neuroscience --> we cannot have enough information about the brain to use it as a guide.
Neocognitron (1980) is the basis of mordern convolutional network (1998).most NN based on a model neuron called the rectified linear unit.
Cognitron (1975)
viewpoint
- Nair and Hinton (2010) and Glorot (2011a) --> neuroscience
- Jarrett (2009) --> engineering-oriented
connectionism or parallel distributed processing (1986 and 1995)
- the central idea : a large number of simple computational units can achieve intelligent behavior when networked together.
- distributed representation (1986)
- successful use of back-propagation to train deep neural network with internal representations and the popularization of the back-propagation algorithm (1986a and 1987)
- some of fundamental mathematical difficulties in modeling long sequences are identified (1991 and 1994)
- the long short-term memory or LSTM network to solve above difficulties (1997)
- Kernel machines (1992,1995 and 1999) and Graphical models (1998) become popluar
- In 1998b and 2001, Canadian Institute for Advanced Research (CIFAR) keep NN research alive.
In 2006, Deep Belief Network can be trained using a strategy called greedy layer-wise pretraining.
- greedy layer-wise pretraining is used to train many kinds of deep network (2007)
- deep learning forcus the depth (2007,2011,2014a and 2014)

Increasing Dataset Sizes

1950s, first experiment of ANN conducted; 1990s, used in commerical applications

Increasing Model Sizes

C1-Introduction

Increasing Accuracy, Complexity and Real-World Impact

1986a, earlist deep models for individual objects in tightly cropped, extremely small images.
2012, modern object recognition networks with high-resolution photographs and uncropped photos.–>error from 26.1% to 15.3% --> down to 3.6%
2010,2010b,2011 and 2012a, error rate of peech recongnition have a sudden drop with DL
2013, DL have successes for pedestrian detection and image segmentation
2012, DL have superhuman performance in traffic sign classification.
2014d,NN can output an entire sequence of characters transcribed from an image.
2013, need labeling of the individual elements of the sequence.
2014 and 2015, RNN–>machine translation
2015,extension of DL is reinforcement learning.
more other application such as medicince(2014)…