Humor

Nomadic Migration: A New Tool for Dynamic Grid Computing

Description
Nomadic Migration: A New Tool for Dynamic Grid Computing
Categories
Published
of 2
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Nomadic Migration: A New Tool for Dynamic Grid Computing Gerd Lanfermann Gabrielle Allen Thomas Radke Edward Seidel Abstract We describe the design and implementation of a technology which provides an application with the abil- ity to seek out and exploit remote computing resources by migrating tasks from site to site, dynamically adapting the application to a changing Grid environment. The motiva- tion for this migration framework, dubbed ÒThe WormÓ, srcinated from the experience of having an abundance of computing time for simulations, which is distributed over multiple sites and split in time chunks by queuing systems. We describe the architecture of the Worm, describing how new r more suitable resources are located, and how the payload simulation is migrated to these resources following a trigger event. The migration technology presented here is designed to be used for any application, including large-scale HPC simulations. I. INTRODUCTION RID computing involves utilizing computational re- G ources, connected by networks, as needed to solve problems. cent advances in Grid computing are such that applications are now in a position to begin to exploit a wide range of available computer resources, simultane- ously, sequentially, or both, enabling many different new and innovative Grid usage scenarios. Adding up the theoretically available computing time across a pool of standard computers, such as idle work- stations, or summing the total computing time granted to a research group by several independent super computing sites will typically yield an impressive capacity of process- ing power. But these resources are neither available on a homogenous architecture base nor are they all continuously accessible over a long period of time. Here we have focussed on a new type of Grid computing appropriate for its dynamic character: self-determined mi- gration of a simulation from one site or collection of sites [4] to any other. We present a migration technology, dubbed Òthe WormÓ, designed for parallel applications with high IO and memory requirements, driven by the need for per- forming large scale simulations on HPC machines [5]. The technology is also applicable for the efficient use of idle cycles from small machine pools. 11 PROTOTYPE XPERIENCES A prototype implementation of the Worm was already demonstrated at Supercomputing 2000 running across the machines of the EGrid Testbed [9] The Worm was im- plemented in the Cactus programming framework [l] 2] The WormÕs payload, which describes the simulation code G.Lanfermann, G.Allen, T.Radke and ESeidel are with the Max- Planck-Institut fur Gravitationsphysik, Albert-Einstein-Institut Golm AEI), E.Seide1 is also with the National Center for Super- computing Applications, Champaign, IL NCSA) 0-7695-1296-8/01 10.00 2001 IEEE :: . . .. .. .. . .. . Data Acccrr .___...______ . . ... . . . Off-Sile Dnln Slornpe Fig. 1 We show the main components of a resource aware, self- replicating Worm. The user payload is encapsulated by the Worm Kernel, acting as a contact point to the resource detector and various application information semrs AIS). The transfer units stage exe- cutables to machines and provides storage for checkpoint files during hibernation. provided by scientists, was a simulation of a wave equa- tion, although any real application written in the Cactus framework can trivially be incorporated as a payload. The participating EGrid sites [7] published characteristic system profiles and load information to a central Resource Manager. While the prototype Worm simulation was run- ning on machines in the Grid, it was able to query this central resource service, seeking available machines of a certain configuration. Using this information it could then migrate to another site according to some predefined crite- ria. 111. WORM TECHNOLOGY N A DYNAMIC RID ENVIRONMENT Based on the early prototype experiences [9] a more sc- phisticated Worm framework was desgined to provide the user with range of migration policies and to overcome nu- merous technical challenges when dealing with heteroge- nous grid environments. With the new worm technology, application migration can be inititated by a variety of cases, which range from the userÕs manual triggering of a migra- tion event to fully automatic application relocation: by monitoring the simulation performance and profiling the current hardware with small benchmarking programms a detailed profile of the current execution environment is gen- erated. Periodic lookups are performed to see if ÒbetterÓ re- sources for this individual profile have become available in which case a migration to the more suited resource may be initiated. Note that in this context ÔLbetterÓ does not necessarily mean ÒfasterÓ but also ÒcheaperÓ, Òmore 429  Hibernation storageÓ, Òless queue waitingÓ or Òbetter networkÓ. Since the Worm migrates between machines in a non- predictable fashion, reacting to the dynamic nature of a Grid, it requires a mechanism for tracking the current and past locations of the Worm. This is handled with different degrees of finesse, for example, by publishing the informa- tion to a centralized Application Information Service (AIS) or an email/SMS notification server. Before migrating, the Worm must have located the next resource by querying a remote Resource Broker (RB), which tracks available computing resources, obtaining load and other information from registered sites. Different RB formats developed by e.g. GrADS 181 and groups within the EGrid are understood. If a suitable resource cannot be found, the simulation hibernates by writing a checkpoint which is stored until appropriate machines become avail- able to host the restarted simulation. The Worm must provide the capability to access the dif- ferent sites without user interaction to copy checkpoint and parameter files, start processes and handle output data. It is essential to provide a secure but easy way to interface these resources. Our Worm supports methods as Globus GSI technology [S] r more simply secure shell and secure COPYThe Worm application (including the user provided pay- load simulation) must be available on the different hetero- geneous machines of the userÕs virtual grid. We support repositories of pre-built binaries and automated compiling Òon-the-flyÓ before execution. To restore the simulation state in a heterogeneous machine environment, the check- point files are coded in an architecture independent format. We use HDF5 [ll] nd the CactusCode framework l] o meet these requirements. Time Application executed in Queue Checkpoint Streaming ____ / F __ _ Reservation Scheduling Resource Sites Fig. 2. The timeline of migration events between three sites is shown. The first two relocations involve hibernation of the application. In the third case an advanced reservation scheduler is used to request overlapping resources, which allows to directly stream checkpoints. IV. CONCLUSIONS Automating and optimizing the usage of multiple re- sources is an essential challenge to Grid-enabling applica- tion software. We have described a technology that en- ables an application on its own to seek out and exploit computational resources on the Grid. This ÒWormÓ ap- proach not only provides applications with the ability to make decisions about resource usage and to self-migrate to new machines if necessary, it also takes into account the heterogeneous nature of resources as well as their dynamic availability in time. Note that although we have spoken in terms of migrating an entire application from site to site, future Grid appli- cations will be able to take advantage of work flow par- allelism: e.g. analysis tasks may be spawned off to other grid resources. The Worm technology paves the ground for these advanced and intelligent Grid applications. There are many possible uses that will be discussed elsewhere [lo] ACKNOWLEDGMENTS The development of the EGrid Worm is a highly collab- orative effort,, and we are indebted to a great many experts at different institutions, especially on the EGrid for their advice and support. It is a pleasure for us to thank, above all Tom Goodale and John Shalf, as well as Ian Foster, Sridhar Gullapalli, Steve Fitzgerald and the Globus team at ANL for their Globus and Data Grid work; Mike Folk and his HDF5 development group at NCSA; Computing resources and technical support have been provided by the EGrid, AEI, NCSA, ANL, and ZIB. We have also benefit- ted from close association with and partial support of the ASC project, NSF PHY-9979985. REFERENCES l] Cactus Code: http://vw.cactuscode.org [2] Allen, G., Goodale, T., Lanfermann, G., Seidel, E., Benger, W., Hege, H.-C., Merzky, A. Mass4 J., Radke, T. and Shalf, J.,  Solving EinsteinÕs Equation on Supemmputers, IEEE Computer, p.52-59, December, 1999. http://vw.computer.org/computer/  articles/einetein_1299_1.htm  [3] Seidel, E. and Suen, W.M., J. Comp. Appl. Math., 109 (1999), 493-525. [4] G. Allen, T. Dramlitsch, G. Lanfermann, E. Seidel, EBcient Techniques for Distributed Computing submitted to HPDClO. [5] G. Allen, W. Benger, T. Goodale, H. Hege, G. Lanfermann, A. Merzky, T. Radke, E. Seidel, J. Shalf, ÒCactus Tools for Grid ApplicationsÓ, to appear in Cluster Computing, (2001). [6] Globus Metacomputing Toolkit: http://rvv.globus.org/  [7 The European Grid-Forum: http://uw.egrid.org [ ] Grid Application Development Software Project: : http ://W isi .edu/grads [9] G. Allen, T. Dramlitsch, T. Goodale, G. Lanfermann, T. Radke, E. Seidel, T. Kielmann, K. Verstoep, 2. Balaton, P. Kacsuk, F. Szalai, J. Gehring, A. Keller, A. Streit, L. Matyska, M. Ruda A. Krenek, H. Frese, H. Knipp, A. Merzky, A. Reinefeld, F. Schin- .tke, B. Ludwiczak, J. Nabrzyski, J. Pukacki, H.-P. Kersken, and M. Russell, Early ezperiences with the Egrid testbed, in IEEE International Symposium on Cluster Computing and the Grid, 2001. [lo] G. AllenJ. Foster, T. Goodale, G. Lanfermann,T . Radke M. Russel1,E. Seidel, J Shalf Grid Computing: An Ap plications Perspective (in preparation) [Ill Hierarchical Data Format Version 5 http: /hdf ncsa. uiuc. du/HDF5 430
Search
Similar documents
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks