Description

1

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Artiﬁcial neural network
“Neural network” redirects here. For networks of livingneurons, see Biological neural network. For the journal,see Neural Networks (journal).In machine learning and related ﬁelds,
artiﬁcial neu-
An artiﬁcial neural network is an interconnected group of nodes,akin to the vast network of neurons in a brain. Here, each circu-
lar node represents an artiﬁcial neuron and an arrow represents a connection from the output of one neuron to the input of an-other.
ralnetworks
(
ANNs
)arecomputationalmodelsinspiredby an animal’s central nervous systems (in particular thebrain), and are used to estimate or approximate functionsthat can depend on a large number of inputs and aregenerally unknown. Artiﬁcial neural networks are gen-erally presented as systems of interconnected neurons which can compute values from inputs, and are capableof machine learning as well as pattern recognition thanks
to their adaptive nature.For example, a neural network for handwriting recogni-tion is deﬁned by a set of input neurons which may beactivated by the pixels of an input image. After beingweighted and transformed by a function (determined bythe network’s designer), the activations of these neuronsare then passed on to other neurons. This process is re-peated until ﬁnally, an output neuron is activated. Thisdetermines which character was read.Like other machine learning methods - systems that learnfrom data - neural networks have been used to solve awide variety of tasks that are hard to solve using ordinaryrule-based programming, including computer vision andspeech recognition.
1 Background
Examinations of the human’s central nervous system in-spired the concept of neural networks. In an Artiﬁ-cial Neural Network, simple artiﬁcial nodes, known as neurons , “neurodes”, “processing elements” or “units”,are connected together to form a network which mimicsa biological neural network.There is no single formal deﬁnition of what an artiﬁcialneural network is. However, a class of statistical mod-els may commonly be called “Neural” if they possess thefollowing characteristics:1. consist of sets of adaptive weights, i.e. numericalparameters that are tuned by a learning algorithm,and2. are capable of approximating non-linear functionsof their inputs.The adaptive weights are conceptually connectionstrengths between neurons, which are activated duringtraining and prediction.Neural networks are similar to biological neural networksin performing functions collectively and in parallel bythe units, rather than there being a clear delineation ofsubtasks to which various units are assigned. The term“neural network” usually refers to models employed instatistics, cognitive psychology and artiﬁcial intelligence.
Neural network models which emulate the central ner-vous system are part of theoretical neuroscience andcomputational neuroscience.In modern software implementations of artiﬁcial neu-ral networks, the approach inspired by biology has beenlargely abandoned for a more practical approach basedon statistics and signal processing. In some of these sys-tems, neural networks or parts of neural networks (likeartiﬁcialneurons)formcomponentsinlargersystemsthatcombinebothadaptiveandnon-adaptiveelements. Whilethe more general approach of such systems is more suit-able for real-world problem solving, it haslittle to do withthe traditional artiﬁcial intelligence connectionist models.1
2
2 HISTORY
What they do have in common, however, is the principleof non-linear, distributed, parallel and local processingand adaptation. Historically, the use of neural networksmodels marked a paradigm shift in the late eighties fromhigh-level (symbolic) artiﬁcial intelligence, characterizedby expert systems with knowledge embodied in
if-then
rules,tolow-level(sub-symbolic)machinelearning,char-acterized by knowledge embodied in the parameters of adynamical system.
2 History
Warren McCulloch and Walter Pitts
[1]
(1943) createda computational model for neural networks based onmathematics and algorithms. They called this modelthreshold logic. The model paved the way for neural net-work research to split into two distinct approaches. Oneapproach focused on biological processes in the brain andtheotherfocusedontheapplicationofneuralnetworkstoartiﬁcial intelligence.In the late 1940s psychologist Donald Hebb
[2]
created ahypothesis of learning based on the mechanism of neuralplasticity that is now known as Hebbian learning. Heb-bian learning is considered to be a 'typical' unsupervisedlearning rule and its later variants were early models forlong term potentiation. These ideas started being appliedto computational models in 1948 with Turing’s B-typemachines.Farley and Wesley A. Clark
[3]
(1954) ﬁrst used compu-tational machines, then called calculators, to simulate aHebbian network at MIT. Other neural network compu-tational machines were created by Rochester, Holland,Habit, and Duda
[4]
(1956).Frank Rosenblatt
[5]
(1958) created the perceptron, analgorithm for pattern recognition based on a two-layerlearning computer network using simple addition andsubtraction. With mathematical notation, Rosenblattalso described circuitry not in the basic perceptron, suchas the exclusive-or circuit, a circuit whose mathemati-cal computation could not be processed until after thebackpropagation algorithm was created by Paul Wer-bos
[6]
(1975).Neural network research stagnated after the publicationof machine learning research by Marvin Minsky andSeymour Papert
[7]
(1969). They discovered two key is-sueswiththecomputationalmachinesthatprocessedneu-ral networks. The ﬁrst issue was that single-layer neu-ral networks were incapable of processing the exclusive-or circuit. The second signiﬁcant issue was that com-puters were not sophisticated enough to eﬀectively han-dle the long run time required by large neural networks.Neuralnetworkresearchsloweduntilcomputersachievedgreater processing power. Also key later advances wasthe backpropagation algorithm which eﬀectively solvedthe exclusive-or problem (Werbos 1975).
[6]
The parallel distributed processing of the mid-1980s be-came popular under the name connectionism. The textby David E. Rumelhart and James McClelland
[8]
(1986)provided a full exposition on the use of connectionism incomputers to simulate neural processes.Neural networks, as used in artiﬁcial intelligence, havetraditionally been viewed as simpliﬁed models of neuralprocessing in the brain, even though the relation betweenthis model and brain biological architecture is debated,as it is not clear to what degree artiﬁcial neural networksmirror brain function.
[9]
In the 1990s, neural networks were overtaken in popular-ity in machine learning by support vector machines andother, much simpler methods such as linear classiﬁers.Renewed interest in neural nets was sparked in the 2000sby the advent of deep learning.
2.1 Recent improvements
Computational devices have been created in CMOS, forboth biophysical simulation and neuromorphic comput-ing. More recent eﬀorts show promise for creatingnanodevices
[10]
for very large scale principal componentsanalyses and convolution. If successful, these eﬀortscould usher in a new era of neural computing
[11]
that isa step beyond digital computing, because it depends onlearning rather than programming and because it is fun-
damentally analog rather than digital even though the ﬁrst
instantiations may in fact be with CMOS digital devices.Between 2009 and 2012, the recurrent neural networksand deep feedforward neural networks developed in theresearch group of Jürgen Schmidhuber at the Swiss
AI Lab IDSIA have won eight international competi-tions in pattern recognition and machine learning.
[12]
For example, multi-dimensional long short term mem-ory (LSTM)
[13][14]
won three competitions in connectedhandwritingrecognitionatthe2009InternationalConfer-ence on Document Analysis and Recognition (ICDAR),without any prior knowledge about the three diﬀerentlanguages to be learned.Variants of the back-propagation algorithm as well as un-supervised methods by Geoﬀ Hinton and colleagues attheUniversityofToronto
[15][16]
canbeusedtotraindeep,highly nonlinear neural architectures similar to the 1980Neocognitron by Kunihiko Fukushima,
[17]
and the “stan-dardarchitectureofvision”,
[18]
inspiredbythesimpleandcomplex cells identiﬁed by David H. Hubel and Torsten
Wiesel in the primary visual cortex.
Deep learning feedforward networks, such asconvolutional neural networks, alternate convolutionallayers and max-pooling layers, topped by several pureclassiﬁcation layers. Fast GPU-based implementations
of this approach have won several pattern recognitioncontests, including the IJCNN 2011 Traﬃc Sign Recog-nition Competition
[19]
and the ISBI 2012 Segmentation
3.1 Network function
3of Neuronal Structures in Electron Microscopy Stackschallenge.
[20]
Such neural networks also were the ﬁrst ar-tiﬁcial pattern recognizers to achieve human-competitiveor even superhuman performance
[21]
on benchmarkssuch as traﬃc sign recognition (IJCNN 2012), or theMNIST handwritten digits problem of Yann LeCun and
colleagues at NYU.
2.2 Successes in pattern recognition con-tests since 2009
Between 2009 and 2012, the recurrent neural networksand deep feedforward neural networks developed in theresearch group of Jürgen Schmidhuber at the Swiss AI
Lab IDSIA have won eight international competitions inpattern recognition and machine learning.
[22]
For exam-ple, the bi-directional and multi-dimensional long shortterm memory (LSTM)
[23][24]
of Alex Graves et al. wonthree competitions in connected handwriting recognitionat the 2009 International Conference on Document Anal-ysis and Recognition (ICDAR), without any prior knowl-edge about the three diﬀerent languages to be learned.Fast GPU-based implementations of this approach byDan Ciresan and colleagues at IDSIA have won severalpattern recognition contests, including the IJCNN 2011Traﬃc Sign Recognition Competition,
[25]
the ISBI 2012Segmentation of Neuronal Structures in Electron Mi-croscopy Stacks challenge,
[20]
and others. Their neuralnetworks also were the ﬁrst artiﬁcial pattern recogniz-ers to achieve human-competitive or even superhumanperformance
[21]
on important benchmarks such as traf-ﬁc sign recognition (IJCNN 2012), or the MNIST hand-written digits problem of Yann LeCun at NYU. Deep,
highly nonlinear neural architectures similar to the 1980neocognitron by Kunihiko Fukushima
[17]
and the “stan-dard architecture of vision”
[18]
can also be pre-trainedby unsupervised methods
[26][27]
of Geoﬀ Hinton's lab atUniversity of Toronto. A team from this lab won a 2012contest sponsored by Merck to design software to helpﬁnd molecules that might lead to new drugs.
[28]
3 Models
Neural network models in artiﬁcial intelligence are usu-ally referred to as artiﬁcial neural networks (ANNs);theseareessentiallysimplemathematicalmodelsdeﬁninga function
f
:
X
→
Y
or a distribution over
X
or both
X
and
Y
, but sometimes models are also intimately associatedwith a particular learning algorithm or learning rule. Acommon use of the phrase ANN model really means thedeﬁnition of a
class
of such functions (where membersof the class are obtained by varying parameters, connec-tion weights, or speciﬁcs of the architecture such as thenumber of neurons or their connectivity).
3.1 Network function
See also: Graphical modelsThe word
network
in the term 'artiﬁcial neural network'refers to the inter–connections between the neurons inthe diﬀerent layers of each system. An example systemhas three layers. The ﬁrst layer has input neurons whichsenddataviasynapsesto thesecondlayerofneurons, andthen via more synapses to the third layer of output neu-rons. More complex systems will have more layers ofneurons with some having increased layers of input neu-rons and output neurons. The synapses store parameterscalled “weights” that manipulate the data in the calcula-tions.AnANNistypicallydeﬁnedbythreetypesofparameters:1. The interconnection pattern between the diﬀerentlayers of neurons2. The learning process for updating the weights of theinterconnections3. The activation function that converts a neuron’sweighted input to its output activation.Mathematically, a neuron’s network function
f
(
x
)
is de-ﬁnedasacompositionofotherfunctions
g
i
(
x
)
, whichcanfurther be deﬁned as a composition of other functions.This can be conveniently represented as a network struc-ture, with arrows depicting the dependencies betweenvariables. A widely used type of composition is the
non-linear weighted sum
, where
f
(
x
)=
K
(
∑
i
w
i
g
i
(
x
)
)
, where
K
(commonly referred to as the activation function
[29]
) issome predeﬁned function, such as the hyperbolic tangent.It will be convenient for the following to refer to a collec-tion of functions
g
i
as simply a vector
g
=(
g
1
,g
2
,...,g
n
)
.
ANN dependency graph
This ﬁgure depicts such a decomposition of
f
, with de-pendencies between variables indicated by arrows. Thesecan be interpreted in two ways.
4
3 MODELS
The ﬁrst view is the functional view: the input
x
is trans-formedintoa3-dimensionalvector
h
,whichisthentrans-formed into a 2-dimensional vector
g
, which is ﬁnallytransformed into
f
. This view is most commonly en-countered in the context of optimization.The second view is the probabilistic view: the randomvariable
F
=
f
(
G
)
depends upon the random variable
G
=
g
(
H
)
, whichdependsupon
H
=
h
(
X
)
, whichdependsuponthe random variable
X
. This view is most commonlyencountered in the context of graphical models.The two views are largely equivalent. In either case, forthis particular network architecture, the components ofindividual layers are independent of each other (e.g., thecomponentsof
g
areindependentofeachothergiventheirinput
h
). This naturally enables a degree of parallelismin the implementation.
Two separate depictions of the recurrent ANN dependency graph
Networks such as the previous one are commonly calledfeedforward, because their graph is a directed acyclicgraph. Networks with cycles are commonly called
recurrent. Such networks are commonly depicted in themannershownatthetopoftheﬁgure,where
f
isshownasbeing dependent upon itself. However, an implied tem-poral dependence is not shown.
3.2 Learning
What has attracted the most interest in neural networks isthe possibility of
learning
. Given a speciﬁc
task
to solve,and a
class
of functions
F
, learning means using a set of
observations
to ﬁnd
f
∗
∈
F
which solves the task in some
optimal
sense.This entails deﬁning a cost function
C
:
F
→
R
such that, forthe optimal solution
f
∗
,
C
(
f
∗
)
≤
C
(
f
)
∀
f
∈
F
– i.e., no so-lution has a cost less than the cost of the optimal solution(see Mathematical optimization).The cost function
C
is an important concept in learning,as it is a measure of how far away a particular solutionis from an optimal solution to the problem to be solved.Learning algorithms search through the solution space toﬁnd a function that has the smallest possible cost.For applications where the solution is dependent on somedata, the cost must necessarily be a
function of the ob-servations
, otherwise we would not be modelling any-thing related to the data. It is frequently deﬁned as astatistic to which only approximations can be made. Asa simple example, consider the problem of ﬁnding themodel
f
, which minimizes
C
=
E
[
(
f
(
x
)
−
y
)
2
]
, for datapairs
(
x,y
)
drawn from some distribution
D
. In practi-cal situations we would only have
N
samples from
D
andthus, for the above example, we would only minimize
ˆ
C
=
1
N
∑
N i
=1
(
f
(
x
i
)
−
y
i
)
2
. Thus, the cost is minimized overa sample of the data rather than the entire data set.When
N
→∞
some form of online machine learning mustbe used, where the cost is partially minimized as eachnew example is seen. While online machine learning isoften used when
D
is ﬁxed, it is most useful in the casewherethedistributionchangesslowlyovertime. Inneuralnetwork methods, some form of online machine learningis frequently used for ﬁnite datasets.See also: Mathematical optimization, Estimation theory
and Machine learning
3.2.1 Choosing a cost function
While it is possible to deﬁne some arbitrary ad hoc costfunction, frequently a particular cost will be used, eitherbecause it has desirable properties (such as convexity) orbecause it arises naturally from a particular formulationof the problem (e.g., in a probabilistic formulation theposterior probability of the model can be used as an in-verse cost). Ultimately, the cost function will depend onthe desired task. An overview of the three main cate-gories of learning tasks is provided below:
3.3 Learning paradigms
There are three major learning paradigms, each corre-sponding to a particular abstract learning task. Theseare supervised learning, unsupervised learning and
reinforcement learning.
3.3.1 Supervised learning
Insupervisedlearning,wearegivenasetofexamplepairs
(
x,y
)
,x
∈
X,y
∈
Y
and the aim is to ﬁnd a function
f
:
X
→
Y
inthe allowed class of functions that matches the examples.In other words, we wish to
infer
the mapping implied by

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks