Documents

1

Description
1
Categories
Published
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Artificial neural network “Neural network” redirects here. For networks of livingneurons, see Biological neural network. For the journal,see Neural Networks (journal).In machine learning and related fields,  artificial neu- An artificial neural network is an interconnected group of nodes,akin to the vast network of  neurons  in a brain. Here, each circu- lar node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of an-other. ralnetworks ( ANNs )arecomputationalmodelsinspiredby an animal’s central nervous systems (in particular thebrain), and are used to estimate or approximate functionsthat can depend on a large number of inputs and aregenerally unknown. Artificial neural networks are gen-erally presented as systems of interconnected neurons which can compute values from inputs, and are capableof machine learning as well as pattern recognition thanks to their adaptive nature.For example, a neural network for handwriting recogni-tion is defined by a set of input neurons which may beactivated by the pixels of an input image. After beingweighted and transformed by a function (determined bythe network’s designer), the activations of these neuronsare then passed on to other neurons. This process is re-peated until finally, an output neuron is activated. Thisdetermines which character was read.Like other machine learning methods - systems that learnfrom data - neural networks have been used to solve awide variety of tasks that are hard to solve using ordinaryrule-based programming, including computer vision andspeech recognition. 1 Background Examinations of the human’s central nervous system in-spired the concept of neural networks. In an Artifi-cial Neural Network, simple artificial nodes, known as neurons , “neurodes”, “processing elements” or “units”,are connected together to form a network which mimicsa biological neural network.There is no single formal definition of what an artificialneural network is. However, a class of statistical mod-els may commonly be called “Neural” if they possess thefollowing characteristics:1. consist of sets of adaptive weights, i.e. numericalparameters that are tuned by a learning algorithm,and2. are capable of approximating non-linear functionsof their inputs.The adaptive weights are conceptually connectionstrengths between neurons, which are activated duringtraining and prediction.Neural networks are similar to biological neural networksin performing functions collectively and in parallel bythe units, rather than there being a clear delineation ofsubtasks to which various units are assigned. The term“neural network” usually refers to models employed instatistics, cognitive psychology and artificial intelligence. Neural network models which emulate the central ner-vous system are part of theoretical neuroscience andcomputational neuroscience.In modern software implementations of artificial neu-ral networks, the approach inspired by biology has beenlargely abandoned for a more practical approach basedon statistics and signal processing. In some of these sys-tems, neural networks or parts of neural networks (likeartificialneurons)formcomponentsinlargersystemsthatcombinebothadaptiveandnon-adaptiveelements. Whilethe more general approach of such systems is more suit-able for real-world problem solving, it haslittle to do withthe traditional artificial intelligence connectionist models.1  2  2 HISTORY  What they do have in common, however, is the principleof non-linear, distributed, parallel and local processingand adaptation. Historically, the use of neural networksmodels marked a paradigm shift in the late eighties fromhigh-level (symbolic) artificial intelligence, characterizedby expert systems with knowledge embodied in  if-then rules,tolow-level(sub-symbolic)machinelearning,char-acterized by knowledge embodied in the parameters of adynamical system. 2 History Warren McCulloch and Walter Pitts [1] (1943) createda computational model for neural networks based onmathematics and algorithms. They called this modelthreshold logic. The model paved the way for neural net-work research to split into two distinct approaches. Oneapproach focused on biological processes in the brain andtheotherfocusedontheapplicationofneuralnetworkstoartificial intelligence.In the late 1940s psychologist Donald Hebb [2] created ahypothesis of learning based on the mechanism of neuralplasticity that is now known as Hebbian learning. Heb-bian learning is considered to be a 'typical' unsupervisedlearning rule and its later variants were early models forlong term potentiation. These ideas started being appliedto computational models in 1948 with Turing’s B-typemachines.Farley and Wesley A. Clark [3] (1954) first used compu-tational machines, then called calculators, to simulate aHebbian network at MIT. Other neural network compu-tational machines were created by Rochester, Holland,Habit, and Duda [4] (1956).Frank Rosenblatt [5] (1958) created the perceptron, analgorithm for pattern recognition based on a two-layerlearning computer network using simple addition andsubtraction. With mathematical notation, Rosenblattalso described circuitry not in the basic perceptron, suchas the exclusive-or circuit, a circuit whose mathemati-cal computation could not be processed until after thebackpropagation algorithm was created by Paul Wer-bos [6] (1975).Neural network research stagnated after the publicationof machine learning research by Marvin Minsky andSeymour Papert [7] (1969). They discovered two key is-sueswiththecomputationalmachinesthatprocessedneu-ral networks. The first issue was that single-layer neu-ral networks were incapable of processing the exclusive-or circuit. The second significant issue was that com-puters were not sophisticated enough to effectively han-dle the long run time required by large neural networks.Neuralnetworkresearchsloweduntilcomputersachievedgreater processing power. Also key later advances wasthe backpropagation algorithm which effectively solvedthe exclusive-or problem (Werbos 1975). [6] The parallel distributed processing of the mid-1980s be-came popular under the name connectionism. The textby David E. Rumelhart and James McClelland [8] (1986)provided a full exposition on the use of connectionism incomputers to simulate neural processes.Neural networks, as used in artificial intelligence, havetraditionally been viewed as simplified models of neuralprocessing in the brain, even though the relation betweenthis model and brain biological architecture is debated,as it is not clear to what degree artificial neural networksmirror brain function. [9] In the 1990s, neural networks were overtaken in popular-ity in machine learning by support vector machines andother, much simpler methods such as linear classifiers.Renewed interest in neural nets was sparked in the 2000sby the advent of deep learning. 2.1 Recent improvements Computational devices have been created in CMOS, forboth biophysical simulation and neuromorphic comput-ing. More recent efforts show promise for creatingnanodevices [10] for very large scale principal componentsanalyses and convolution. If successful, these effortscould usher in a new era of neural computing [11] that isa step beyond digital computing, because it depends onlearning rather than programming and because it is fun- damentally analog rather than digital even though the first instantiations may in fact be with CMOS digital devices.Between 2009 and 2012, the recurrent neural networksand deep feedforward neural networks developed in theresearch group of Jürgen Schmidhuber at the Swiss AI Lab IDSIA have won eight international competi-tions in pattern recognition and machine learning. [12] For example, multi-dimensional long short term mem-ory (LSTM) [13][14] won three competitions in connectedhandwritingrecognitionatthe2009InternationalConfer-ence on Document Analysis and Recognition (ICDAR),without any prior knowledge about the three differentlanguages to be learned.Variants of the back-propagation algorithm as well as un-supervised methods by Geoff Hinton and colleagues attheUniversityofToronto [15][16] canbeusedtotraindeep,highly nonlinear neural architectures similar to the 1980Neocognitron by Kunihiko Fukushima, [17] and the “stan-dardarchitectureofvision”, [18] inspiredbythesimpleandcomplex cells identified by David H. Hubel and Torsten Wiesel in the primary visual cortex. Deep learning feedforward networks, such asconvolutional neural networks, alternate convolutionallayers and max-pooling layers, topped by several pureclassification layers. Fast GPU-based implementations of this approach have won several pattern recognitioncontests, including the IJCNN 2011 Traffic Sign Recog-nition Competition [19] and the ISBI 2012 Segmentation  3.1 Network function  3of Neuronal Structures in Electron Microscopy Stackschallenge. [20] Such neural networks also were the first ar-tificial pattern recognizers to achieve human-competitiveor even superhuman performance [21] on benchmarkssuch as traffic sign recognition (IJCNN 2012), or theMNIST handwritten digits problem of Yann LeCun and colleagues at NYU. 2.2 Successes in pattern recognition con-tests since 2009 Between 2009 and 2012, the recurrent neural networksand deep feedforward neural networks developed in theresearch group of Jürgen Schmidhuber at the Swiss AI Lab IDSIA have won eight international competitions inpattern recognition and machine learning. [22] For exam-ple, the bi-directional and multi-dimensional long shortterm memory (LSTM) [23][24] of Alex Graves et al. wonthree competitions in connected handwriting recognitionat the 2009 International Conference on Document Anal-ysis and Recognition (ICDAR), without any prior knowl-edge about the three different languages to be learned.Fast GPU-based implementations of this approach byDan Ciresan and colleagues at IDSIA have won severalpattern recognition contests, including the IJCNN 2011Traffic Sign Recognition Competition, [25] the ISBI 2012Segmentation of Neuronal Structures in Electron Mi-croscopy Stacks challenge, [20] and others. Their neuralnetworks also were the first artificial pattern recogniz-ers to achieve human-competitive or even superhumanperformance [21] on important benchmarks such as traf-fic sign recognition (IJCNN 2012), or the MNIST hand-written digits problem of Yann LeCun at NYU. Deep, highly nonlinear neural architectures similar to the 1980neocognitron by Kunihiko Fukushima [17] and the “stan-dard architecture of vision” [18] can also be pre-trainedby unsupervised methods [26][27] of Geoff Hinton's lab atUniversity of Toronto. A team from this lab won a 2012contest sponsored by Merck to design software to helpfind molecules that might lead to new drugs. [28] 3 Models Neural network models in artificial intelligence are usu-ally referred to as artificial neural networks (ANNs);theseareessentiallysimplemathematicalmodelsdefininga function  f  : X → Y   or a distribution over  X  or both  X  and Y   , but sometimes models are also intimately associatedwith a particular learning algorithm or learning rule. Acommon use of the phrase ANN model really means thedefinition of a  class   of such functions (where membersof the class are obtained by varying parameters, connec-tion weights, or specifics of the architecture such as thenumber of neurons or their connectivity). 3.1 Network function See also: Graphical modelsThe word  network   in the term 'artificial neural network'refers to the inter–connections between the neurons inthe different layers of each system. An example systemhas three layers. The first layer has input neurons whichsenddataviasynapsesto thesecondlayerofneurons, andthen via more synapses to the third layer of output neu-rons. More complex systems will have more layers ofneurons with some having increased layers of input neu-rons and output neurons. The synapses store parameterscalled “weights” that manipulate the data in the calcula-tions.AnANNistypicallydefinedbythreetypesofparameters:1. The interconnection pattern between the differentlayers of neurons2. The learning process for updating the weights of theinterconnections3. The activation function that converts a neuron’sweighted input to its output activation.Mathematically, a neuron’s network function  f  ( x )  is de-finedasacompositionofotherfunctions g i ( x ) , whichcanfurther be defined as a composition of other functions.This can be conveniently represented as a network struc-ture, with arrows depicting the dependencies betweenvariables. A widely used type of composition is the  non-linear weighted sum , where  f  ( x )= K  ( ∑ i  w i g i ( x ) )  , where K   (commonly referred to as the activation function [29] ) issome predefined function, such as the hyperbolic tangent.It will be convenient for the following to refer to a collec-tion of functions  g i  as simply a vector  g =( g 1 ,g 2 ,...,g n )  . ANN dependency graph This figure depicts such a decomposition of  f   , with de-pendencies between variables indicated by arrows. Thesecan be interpreted in two ways.  4  3 MODELS  The first view is the functional view: the input  x  is trans-formedintoa3-dimensionalvector h ,whichisthentrans-formed into a 2-dimensional vector  g  , which is finallytransformed into  f   . This view is most commonly en-countered in the context of optimization.The second view is the probabilistic view: the randomvariable  F  = f  ( G )  depends upon the random variable  G = g ( H  ) , whichdependsupon H  = h ( X ) , whichdependsuponthe random variable  X  . This view is most commonlyencountered in the context of graphical models.The two views are largely equivalent. In either case, forthis particular network architecture, the components ofindividual layers are independent of each other (e.g., thecomponentsof g  areindependentofeachothergiventheirinput  h  ). This naturally enables a degree of parallelismin the implementation. Two separate depictions of the recurrent ANN dependency graph Networks such as the previous one are commonly calledfeedforward, because their graph is a directed acyclicgraph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in themannershownatthetopofthefigure,where f   isshownasbeing dependent upon itself. However, an implied tem-poral dependence is not shown. 3.2 Learning What has attracted the most interest in neural networks isthe possibility of  learning . Given a specific  task   to solve,and a  class   of functions  F   , learning means using a set of observations   to find  f  ∗ ∈ F   which solves the task in some optimal   sense.This entails defining a cost function  C  : F  → R  such that, forthe optimal solution  f  ∗ ,  C  ( f  ∗ ) ≤ C  ( f  )  ∀ f  ∈ F   – i.e., no so-lution has a cost less than the cost of the optimal solution(see Mathematical optimization).The cost function  C   is an important concept in learning,as it is a measure of how far away a particular solutionis from an optimal solution to the problem to be solved.Learning algorithms search through the solution space tofind a function that has the smallest possible cost.For applications where the solution is dependent on somedata, the cost must necessarily be a  function of the ob-servations  , otherwise we would not be modelling any-thing related to the data. It is frequently defined as astatistic to which only approximations can be made. Asa simple example, consider the problem of finding themodel  f   , which minimizes  C  = E  [ ( f  ( x ) − y ) 2 ]  , for datapairs  ( x,y )  drawn from some distribution  D  . In practi-cal situations we would only have  N   samples from  D  andthus, for the above example, we would only minimize ˆ C  =  1 N  ∑ N i =1 ( f  ( x i ) − y i ) 2 . Thus, the cost is minimized overa sample of the data rather than the entire data set.When  N  →∞ some form of online machine learning mustbe used, where the cost is partially minimized as eachnew example is seen. While online machine learning isoften used when  D  is fixed, it is most useful in the casewherethedistributionchangesslowlyovertime. Inneuralnetwork methods, some form of online machine learningis frequently used for finite datasets.See also: Mathematical optimization, Estimation theory and Machine learning 3.2.1 Choosing a cost function While it is possible to define some arbitrary ad hoc costfunction, frequently a particular cost will be used, eitherbecause it has desirable properties (such as convexity) orbecause it arises naturally from a particular formulationof the problem (e.g., in a probabilistic formulation theposterior probability of the model can be used as an in-verse cost). Ultimately, the cost function will depend onthe desired task. An overview of the three main cate-gories of learning tasks is provided below: 3.3 Learning paradigms There are three major learning paradigms, each corre-sponding to a particular abstract learning task. Theseare supervised learning, unsupervised learning and reinforcement learning. 3.3.1 Supervised learning Insupervisedlearning,wearegivenasetofexamplepairs ( x,y ) ,x ∈ X,y ∈ Y   and the aim is to find a function  f  : X → Y   inthe allowed class of functions that matches the examples.In other words, we wish to  infer   the mapping implied by

65.pdf

Jul 23, 2017

Sega case study

Jul 23, 2017
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks