A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  • 1. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 1 A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems Umit Cavus Buyuksahin, Maria Stylianou, Nicos Demetriou, Muhammad Adnan Khan Abstract—Over the last decades, distributed systems are pro- their capacity. Due to this demand, researchers turn to unusedmoted for extended computations and are presented as the ideal storage resources. Globally, there are many personal computersstorage space for large amounts of data. Distributed Storage whose resources are not fully used by their owners. VolunteerSystems have been moved from the centralized architecture to amore decentralized approach. This change allows such systems to computing systems aim to use these storage for enormous-be used by volunteer computing systems, where the exploitation sized computations by considering them as if they were partsof any available storage and resources is essential and greatly of a huge supercomputer. This is a powerful way to utilizeneeded. This survey explores the characteristics of scalable distributed resources, in order to complete large-scale tasks.decentralized storage systems that can be used by volunteer Volunteer computing systems have two main bases [7]. Thecomputing systems and discusses the various existing systems interms of the specified characteristics. For each surveyed system first one is the computational base, in which large computa-we give a brief description and whether the required properties tion tasks are split into smaller tasks which are assigned toare ensured. volunteer participants’ computers. The second base is called Index Terms—decentralized storage systems, volunteer com- participative base and it deploys large number of volunteerputing systems participants who offer their resources. One of the well known volunteer computing systems is SETI@home launched by BOINC projects [8]. Nowadays, I. INTRODUCTION SETI@home works with about one million computers which Storage is one of the fundamental parts of the computing provide approximately 70 TeraFLOPs processing rate [8].[1]. Although it has lower speed than RAM, it has great Of course this resource usage can be increased when wepersistence and low cost. Thus, central storage systems were look at the potential resource in the world. However this isconstructed and focused on reliability, stability, and efficiency. unnecessary since the network is growing rapidly.However, nowadays computation is not limited on a central These volunteer computing systems produce huge amountsstorage space, but it is executed in a global environment, of computational data that should be stored. This data maylike Internet. As Internet becomes part of this computation, it be used for later processing or sharing with other scientificproduces huge amounts of information that need to be gathered organizations that may contribute to science area. However,and stored. For addressing this challenge, distributed storages today’s volunteer computing systems use centralized stor-systems are introduced. In this design, data stored by hosts age systems [9] to distribute data to participants. It suffersbecome geographically distributed. Because of this distribu- from limitations of centralized storage systems such as fault-tion and the appearance of huge demands, new challenges tolerance, availability and scalability.arise, such as fault-tolerance, availability, security, robustness, In order to pass over these limitations, new storage systemssurvivability, scalability, anonymity. are developed which are decentralized and can be used by With the grow of Internet, distributed storage systems are volunteer computing systems efficiently. As previously men-able to scale using larger amounts of users. This growth tioned, there are many kind of decentralized storage systems.has emerge the difficulty of having one central point for However, not all of them are suitable to be used in volunteeradministrating the system. Therefore, it is observed in other computing systems. In this survey we study several storagesurveys that these systems are moving from the centralized systems, we discuss their characteristics and challenges and wearchitecture to a more decentralized approach [1]. propose the most proper one to be used in volunteer computing Meanwhile, supercomputers are situated among us exe- systems.cuting big computations which require huge storage, power The rest of the paper is organized as follows: In section 3,and computational resources, and lead to a rapid decrease of we present related work done by other researches in the field. In section 4, design issues of decentralized storage systems Umit Cavus Buyuksahin, Universitat Politecnica de Catalunya (UPC). E- that can be used in volunteer computing systems are examinedmail: Maria Stylianou, Universitat Politecnica de Catalunya (UPC). E-mail: by extracting characteristics. In section 5 we briefly some of the existing decentralized storage systems. Later on, Nicos Demetriou, Universitat Politecnica de Catalunya (UPC). E-mail: in section 6 we compare them regarding their Muhammad Adnan Khan, Universitat Politecnica de Catalunya (UPC). E- and benefits and propose the most suitable one to be used volunteer computing systems. Finally, in section 6 we conclude
  • 2. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 2the survey with our final remarks about the systems studied. anonymity in volunteering can increase the number of par- ticipants which is highly appreciated and encouraged. What is II. R ELATED W ORKS more, anonymity can be a way to prevent the denial of access for special groups of people, which is possible when personal In this section we present the different surveys related information is the subject that we are focused on. [3] discusses the 5) Robustness: Both types of systems, storage and vol-different properties of the Peer-to-Peer based distributed file unteer computing are prone to failures, as machines maysystems. It shows the various benefits of using P2P systems, crash, reboot, or change location with different network char-the design issues and properties. In addition it presents the acteristics and capabilities. In order to efficiently associatemajor distributed file systems comparing the advantages and decentralized storage systems with volunteer ones, the formerdisadvantages for each one in detail. As well, [4] provides an systems should be robust enough to handle these changes andinsight into existing storage systems, giving a good overview repair themselves in the case of failures, in order to preserveof each and describes the important characteristics they should this advantage in volunteer computing systems as well.have. In [1], a variety of distributed storage systems is coveredin depth, presenting their functionalities and putting the readerinto the problems that these systems face and the solutions IV. D ECENTRALIZED S TORAGE S YSTEMSproposed to overcome them. A quite short but rich paper is In the following section, we present a short summary for the[2] discusses the evolving area of distributed storage systems storage systems studied, referring to the previously explainedand gives a brief summary of some related systems in order provide a broader view for the subject. A. FreeHaven III. P RINCIPAL C HARACTERISTICS OF D ECENTRALIZED FreeHaven [10] firstly came with a solution about S TORAGE S YSTEMS anonymity whose implementation is not commonly handled by Several decentralized storage systems have been proposed distributed storage systems. This means that it provides peersover the last years. However, not all of them are suitable to distribute and share data anonymously by protecting peers’for volunteer computing. Specific characteristics should be identity. The other goals of FreeHaven are: (a) Persistence forexamined and we should ensure their existence in the intended determining lifetime of documents, (b) Flexibility for changingstorage systems, in order to meet the requirements of volunteer systems functions, (c) Accountability for limiting damage tocomputing systems. Below, we analyze the most important system.ones, their specifications and effects. Since there is not a hierarchy and all nodes are on the 1) Symmetry: Symmetry is a desired characteristic as much same level, it is a pure peer-to-peer system, it is symmetricfor decentralized storage systems as for volunteer computing and balanced. Despite of the fact that nodes do not have spe-systems. In the case of storage systems, and more precisely in cial capability unlike client-server systems, they have specialpure peer-to-peer systems, symmetry exists when all peers are roles such as the author who initially creates documents, theon the same level with equivalent functionality [3]. Similarly, publisher who put the documents to FreeHaven system, thein the case of volunteer computing systems, each volunteer reader who takes documents from systems, and servers whoparticipant does not have priority nor a special treatment provide storage. All these nodes have a pseudonym and nodescompared to others. Also, volunteers do not need a permission know each other by their pseudonym. Thus, locating the peersfrom an administrator to execute a task or to save data. This is a difficult issue. In addition, tracing the routes is difficultis done by definition independently and automatically. issue as well, since FreeHaven uses onion routing that is used 2) Availability: In volunteer computing systems, it is ex- for broadcasting the queries. The difficulties in both locatingpected that participants can not be enforced to enter the system peers and tracing the routes is for protecting the user identityor leave the system in specific moments. Data should be reach- that means supplying anonymously communication. Serverable independently from the peers status, from their location nodes periodically trade parts of documents called shares withand from the time of the request. Therefore, availability is an each other. That trading gives flexibility to the system inessential property for decentralized storage systems in order the sense that servers can join and leave easily and withoutto be used in volunteer computing systems. special treatment. For trading, nodes are chosen by a node 3) Scalability: Another important issue that has to be list that is ordered by reputation. While a successful tradeconsidered in both storage and volunteer computing systems, is increases the node’s reputation, malicious behavior decreasesthe system’s scalability. Apparently, in decentralized systems, it [1]. In order to avoid malicious behavior and limitingit is mandatory that they can scale enough regarding the damage the system, each node notifies its buddies about sharenumber of nodes. Scalability is an essential property for these movements. This buddy mechanism supplies, in order to ensure that their functionality is preserved Moreover, FreeHaven is also robust since it can keep documentwith the increase system’s size. although a high threshold of its shares is lost. 4) Anonymity: In volunteer computing systems, it is highly Because of its pursuit of anonymity, persistence, flexibilitydesirable from volunteers to keep their identity secret, while and accountability; efficiency and convenience are ignored.offering their resources. People are less willing to help when In order to supply availability it uses trading mechanismthey are required to share personal information. Therefore, instead of replication mechanism, thus the system is not highly
  • 3. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 3available[2]. Finally, inefficient broadcasts for communication and write operations. Though, the number of users that canmake FreeHaven less efficient. use Ivy is limited. Thus, it is designed to be utilized by small groups of cooperative users.B. FreeNet All peers are identical and have ability of working either as a client or as a server. Because of its symmetric architecture, it is FreeNet [11] is an adaptive pure peer-to-peer storage sys- called pure peer-to-peer. Each node has two main components:tem for publication, replication, anonymity of authors/readers Chord/Dhash for reliable P2P distributed storage and Ivywhile retrieving data. Like FreeHaven, first goal of FreeNet is Server for transferring data between peers. This architecture isanonymity and privacy. However, the anonymity of FreeNet actually log based. Each peer has its own log that includes userdoes not stand for all network, it is just for file transaction be- information and changes in the file system. Thus for each NFScause FreeNet provides anonymity at application layer instead operation a log is created that is stored by Chord/DHash. Sinceof transport layer. Thus, discovering source and destination is they are immutable and are kept infinitely, peers can withdrawinfeasible. The other goals of FreeNet is deniability, resistance, any changes. This flexibility is one of the best properties ofefficiency and decentralization. Ivy. All users can read any logs though some file permission The nodes in the peer-to-peer FreeNet network, query a attributes.file that is represented by a location independent key that While a file system is created, a set of logs is created andis obtained from hash functions for anonymity. Each node a group of peers is set upon these logs. An entry pointing tomaintains each local store that is accessible for others to read a file’s log is put on a view array. This array is traversed byand write and have dynamic routing table that includes other all peers in order to create a snapshot. The logs are ordered inpeers’ address with their own keys. Whenever a node receives the array and peers use them for records. Thus some users cana request, it firstly checks its local store. If it exists, it returns use one of the logs concurrently. This cause conflicts, sincedata, otherwise it forwards the request to the node that has the Ivy permits concurrent write operations. For this purpose, Ivynearest key in the routing table. Furthermore, if the request uses close-to-open consistency in a group of peers. In thisbecomes successful, intended data will return like the request. consistency, the Ivy server waits for Dhash which will receiveWhile data is retrieved, a node on the way also caches this data new log receipts in order to commit a modify operation. Thenand inserts new key to its own routing table. This mechanism that modification is announced. For each NFS operation, peersprovides transparent replication and increasing connectivity in take the latest view array from DHash. Then peers checkthe system. In order to cope with limited storage capacity concurrent view vectors that affect the same file by traversingefficiently, node storage is managed by LRU (Least Recently logs. In any conflict condition, differences are analyzed andUsed) that means data items are sorted based on time of most merged. For file modification an optimistic approach is used,recent request. Therefore, lastly requested data will be at the although for file creation locking approach is used. Thusend of the queue. This mechanism does not ensure long term when the number of users is increased, performance will besurvivability for less-interested files. decreased. Because of limited scalability [1], Ivy is suited for The FreeNet protocol is packet-oriented and uses self- a small group of users.contained messages. Each message contains hops-to-live limit, Every user stores a log of their modifications and at adepth counter and randomly generated transactionID. It makes specified time interval, it generates a snapshot, a process whichthe corresponding file traceable by nodes. Hops-to-live is set requires them to retrieve logs from all participating the sender of the message and it prevents indefinite message Although retrieving logs of all peers cause a bottleneck inforwarding. Depth counter is used for setting a sufficient performance, peers can freely change a file system regardlessnumber of hop-to-live to ensure that the request will reach of other peers’ state. The immutable and indefinitely storedits destination. Thus, it is incremented at each node. These logs can be used for withdrawing changes. But this operationthree values are used for inserting, retrieving and requesting is highly costed. As a result, Ivy is distributing its storage butoperations. In order to supply anonymity, it uses probabilistic it only supports a limited write-once/read-many interface [1].routing that does not direct communication towards specificreceivers. D. Frangipani Since probabilistic routing is used for providing anonymity, Frangipani [13] is a high performance distributed storageperformance and reliability is not addressed. Like FreeHaven, that is utilized by a cooperative group of users. It is not ain order to supply anonymous communication, performance is pure peer to peer system, since there is an administrator. It isscarified. However, because of dynamic storage and routing, aimed to minimize operations of the administrator that meansFreeNet network is highly scalable [3]. Moreover it is robust Frangipani keeps it simple while many nodes are joining [1].against big failures. Moreover, it is designed to be used in an institution that has secure and private network. Thus, it is not so scalable.C. Ivy However, it provides to users a good performance, since it Ivy [12] is another peer-to-peer storage system with file stripes data between servers by increasing performance in thesystem like interface. There is no centralized or dedicated number of active servers. Frangipani can also be configuredcomponent, thus each user is on the same level. Although to replicate data [1]. Therefore, it offers redundancy andmany other peer-to-peer storage systems just support either resilience to failures. This is a crucial property for volunteerread or write operations for one owner, Ivy supports both read computing systems.
  • 4. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 4 Frangipani has three main components. The first one is its simplicity by providing correct read-write and shared-writethe Petal Server which provides a virtual disk interface to semantics between clients via synchronous I/O, and extendingdistributed storage. It looks like a local storage, thus it supports the application interface to relax consistency for performancea transparent interface to users since distributed storage is conscious distributed applications. File and directory metadatahidden. The second component is the Distributed Locking in Ceph is very small, almost only directory entries (fileService. It supports consistency in the manner of multiple names) and inodes (80 bytes) in comparison with conventionalreaders - single writer locking philosophy. There are two types file systems, where no file allocation metadata is necessary. Inof locks, the read and the write. When there are multiple Ceph, object names are constructed using the inode number,changes on a file, this service makes them serial to keep and distributed to OSDs using CRUSH. In order for Cephconsistency by using these locks. Since Frangipani ensures to distribute large amount of data a strategy is adapted thatall file in consistent state by locking mechanism, it fairly distributes new data randomly, migrates a random subsampledegrades its performance. The third component is Frangipani of existing data to new devices and uniformly redistributesFile Server Module that provides a file system like an interface. data from removed devices. To maintain system availabilityIt communicates with other components to be in a con
  • Search
    Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks