Business

A General Purpose Parallel Neural Network Architecture

Description
A General Purpose Parallel Neural Network Architecture
Categories
Published
of 4
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Brill Academic PublishersP.O. Box 9000, 2300 PA Leiden,The Netherlands Lecture Series on Computer and Computational Sciences Volume 4, 2006, pp. 358-361 A General Purpose Parallel Neural Network Architecture A. Margaris 1 and M. Roumeliotis Department of Applied Informatics, University of Macedonia,GR-540 06 Thessaloniki, GreeceReceived 5 August, 2005; accepted in revised form 15 August, 2005 Abstract: The objective of this work-in-progress paper is the presentation of the designof a general purpose parallel neural network simulator that can be used in a distributedcomputing system as well as in a cluster environment. The design of this simulator followsthe object oriented approach, while the adopted parallel programming paradigm is themessage passing interface. Keywords: Parallel programming; neural networks; message passing interface Mathematics Subject Classification: 68W10;68T01;65Y05 1 Introduction As it is well known from the literature, the main drawbacks of the serial neural networks arethe time loss and high computational cost associated with their learning phase [1]. This fact incombination with the weak performance of the single sequential machines and the large amount of natural parallelism that characterizes the neural networks makes the parallelization of the operationof these structures imperative. In recent years, many parallel neural network simulators have beendeveloped - such as the PANNS parallel backpropagation simulator [2] and the ParSOM parallelself organizing map [3] - that extend the features of the various sequential simulators - such asPDP++ [4], SNNS [5] and PlaNet [6] - to the domain of parallel processing.The most important and difficult task of the parallelization process is the division of the networkstructure as well as the mapping of this structure to system processes. The requirements of efficientimplementation of a generic network model on a multicomputer are discussed by Ghosh [7], whilea special case of the optimal mapping of the learning process in multilayer feedforward neuralnetworks on message passing systems is discussed by Chu and Wah [8]. This mapping defines thelevel of parallelism of the network structure that according to Nordstrom and Svensson [9] can beone of the following types: (a) training session parallelism, (b) training set parallelism, (c) layerparallelism, (d) node parallelism, and (e) weight parallelism. This categorization is applied tobackpropagation feedforward networks but it can be easily extended to other network types.The assignment of the various structures to the elements of the distributed system depends onthe system type and the selected programming model. In most cases the parallel neural networksimulators follow the SMPD programming model that is based of the SIMD architecture type. 2 The structure of the parallel simulator The main design aspect of the proposed parallel simulator is the partitioning of the network struc-ture and the assignment of the resulting network segments to the system processes. The application 1 Corresponding author. E-mail: amarg@uom.gr  Parallel Neural Networks 359 has been designed to support three different partitioning types: (a) horizontal partitioning in whicheach process is associated with one or more network layers (b) vertical partitioning in which eachprocess gets a subset of the neurons of all layers (in general) and (c) custom partitioning in whichthe network segmentation is arbitrary according to the user needs. These partitioning types areshown in figure 1.Figure 1: Partitioning types of a parallel neural networkThe segmentation of the network structure is performed by a root process that performs addi-tional administrative tasks such as the distribution of the training set data and the calculation of the global error for each training epoch. A local-to-global mapping mechanism allows the neuroncommunication even though the source and the target neuron of a synaptic connection belongto different processes; in the last case the information flow is based on blocking message passingSEND and RECV operations, while, in cases where the source and the target neuron of a synapsebelong to the same process, the weight adaptation is based to single COPY procedures. Regardingthe partitioning process and the network initialization algorithm they are different for the root andthe other processes and in general include the following steps: • Operations performed only by the root process1. The whole network structure is created in the process memory2. For each layer and neuron - according to the partitioning schema - the rank of the targetprocess is identified3. For each process rank the following actions are performed: – The memory buffer of appropriate size is allocated for the packing operation  360 A. Margaris, M. Roumeliotis – The network structure is scanned; if the rank on an element is the same with thecurrent rank value this element is packed in the buffer – The packed buffer is sent to the target process; an additional message with thebuffer size is sent prior to it. – After the completion of the transfer operation the packet buffer is freed4. A network object is created in the memory of the root process5. The serial network is scanned; if the rank of an element is equal to the root processrank, the element is copied to the appropriate position of the new network6. The created network segment is scanned. For each neuron, the source and the targetneurons of each input and output link are identified and the local identifiers are deter-mined. If these neurons belong to the root process, too, they are ’physically’ joined withthe current neuron, otherwise the associated pointers are set to the NULL value7. The initial serial network is deleted. • Operations performed by all processes except the root1. The message that contains the packet buffer size is received2. The packet buffer is allocated in the process local memory3. The packet buffer is received and a network object is created4. The network objects are unpacked one after the other and they are inserted in thecorrect positions in the local network segment5. After the creation of the local segment it goes through a two-step procedure – For each neuron its input and output links are examined and the local identifiers of the source and target neurons of each link are determined. If the source neuron of an input link does not exist since it belongs to another process, the correspondingidentifier is set to the ’NOT AVAILABLE’ constant. The same action is performedfor the target neurons of the output links. – The network segment is scanned again and the pointers of the source and the targetneurons are set to those neurons if they belong to the root process or to the NULLvalue if they are not available.6. The packet buffer is freed. 3 Training set manipulation The manipulation of the training set used during the learning phase depends on the partitioningtype selected by the user and in general, is performed in the following way: • In the case of vertical partitioning, the input and the output layers belong to a single process.In this case, the input patterns are packed and sent to the process that holds the input layer,while the desired output patterns are packed and sent to the process that holds the outputlayer. • In the case of the horizontal and the custom partitioning, the layers (in general) are dis-tributed to different processes. If the input and the output layer have been distributed inthis way, the training set is divided accordingly and its packed version is sent to the appro-priate process. In typical situations, there are input and output neurons held by the sameprocess; in this case the packet buffer will have input as well as desired output patterns.  Parallel Neural Networks 361 The partitioning of the training set is performed by the root process and follows the networkinitialization stage. The root process loads the training data in its local memory, and then packsand sends the appropriate buffer to the other processes. In the next step, it creates its own trainingset segment and destroys the initial TSet object. 4 Conclusions This work-in-progress paper presents the main design aspects of a general purpose parallel neuralnetwork simulator that implements the most important parallelization approaches such as thelayer and the training set parallelization. Up to this point the application supports only thehorizontal parallelization and a lot of programming work has devoted to the implementation of all the features described in the previous sections. Future work includes the design of a loggingmechanism to record system events, a traffic monitoring system to examine the network traffic forprocesses that run on different machines and an X Window interface to allow the design of thenetwork structure and the partitioning to the system processes in a graphical way. Acknowledgments Work supported by the ”EPEAEK Archimedes II” Programme, funded in part by the EuropeanUnion (75%) and in part by the Greek Ministry of National Education and Religious Affairs (25%). References [1] Boniface, Y. Alexandre, F. Vialle, S. A Bridge between two Paradigms for Parallelism: Neu-ral Networks and General Purpose MIMD Computers in Proceedings of International Joint Conference on Neural Networks, (IJCNN’99) , Washington, D.C.[2] Fuerle, T. Schikuta, E. PAANS - A Parallelized Artificial Neural Network Simulator  in Pro-ceedings of  4 th International Conference on Neural Information Processing (ICONIP’97) ,Dunedin, New Zeland, Springer Verlag, November 1997.[3] Tomsich, P. Rauber, A. Merkl, D. Optimizing the parSOM Neural Network Implementation  for Data Mining with Distributed Memory Systems and Cluster Computing  , in Proceedings of 11th International Workshop on Databases and Expert Systems Applications , September 2000,Greenwich, London UK, pages 661-666.[4] Chadley K. Dawson, Randall C. O’Reilly, James McClelland The PDP++ Software UsersManual  , Carnegie Mellon University, 2003.[5] Andreas Zell et al. SNNS Version 4.2 User Manual  , University of Stuttgart, Institute forParallel and Distributed High Performance Systems (IPVR), Stuttgart, 1995.[6] Yoshiro Miyata A User Guide to PlaNet Version 5.6  , University of Colorado, Boulder, Com-puter Science Department, 1989.[7] J. Ghosh and K. Hwang Mapping Neural Networks onto Message Passing Multicomputers ,Journal of Parallel and Distributed Computing, 6:291-330, 1989.[8] L.C.Chu and B.W.Wah Optical Mapping of Neural Networks Learning on Message Passing Multicomputer  , Journal of Parallel and Distributed Computing, 14:319-339, 1992.[9] Using and Designing Massively Parallel Computers for Atrificial Neural Networks , Journal of Parallel and Distributed Computing, Volume 14, No. 3, 1992, pp. 260-285.
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks