Documents

Blog.techcello.com-Database Sharding Scaling Data in a Multi Tenant Environment

Description
Database Sharding Scaling Data in a Multi Tenant Environment
Categories
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Database Sharding – Scaling data in a multi tenantenvironment blog.techcello.com /2012/07/database-sharding-scaling-data-in-a-multi-tenant-environment/ A multi tenant SaaS product should be efficient enough to scale seamlessly without compromising onReliability, Availability and Performance. Large scale applications which are built with the intention of handling thousands of users accessing in a concurrent fashion is to be well equipped and architectedto handle a medium sized customer with few hundreds of users to a large customer with fairly hugenumber of active users. When the number of tenants and users grow, the number of IOs which hits thedatabase will also increase which is susceptible for performance degradation.While introducing a new SaaS product, the application has tenant data stored in separate databasesplus an extra database for the list of tenants as well as data shared by every application [Generallycalled as Meta Database]. For Example Financial application such as Accounts Management, ExpenseManagement, Payroll applications, the shared information would include items like currency exchangeand different VAT rates for different states and other information such as colour themes, languagepreference that are identical for each tenant. Everything fits into a single DB server and life will begood. But over some time when the business picks up and more tenants buy the product, theapplication provider we’ll be pushed to choose different database or different schema else splitting thesingle server into multiple server for the shared data plus application database servers. At time, the customers might want to run some complex analytical reports, which have a tendency toconsume lot of resources of the tenant database servers. Again the application provider has to set upseparate scaled up servers with optimized hardware to load data dynamically from tenant databases. And this is just the beginning of additional servers as the SaaS adds more customers and add newservices. Sometimes, it is natural for successful SaaS application owners to run 100 or more DBMSservers.One effective way of scaling a database is through partitioning, dividing the data into smaller divisionsin order to improve the efficiency of queries and updates. On the long run the application providers hasto come up with data partitioning strategies to determine the best way to partition the customers data.The partition decision can be based on different aspects such asGeographic based data separationUsage PatternsCustomer GrowthData security and PolicyData isolationCost efficiency Data Sharding  An important quote about database is “Small Databases are always faster, large database is not,so keep your database as small as possible” . Vertical scaling of the database server can be usefulonly up to a certain point. Beyond this, the database itself has to be partitioned. Following are the threekey techniques which could be considered to ensure scalability in a Shared DB-Shared Schema [SDB-SS] Model.ã Multiple Database Instances ã Partitioned Table and Indexes ã File Group Allocation  In most situations, it is likely that database size will keep growing. Therefore, it is also important tohave dynamic repartitioning strategies in place, to ensure that already-partitioned data can berepartitioned in order to keep up with performance and scale metrics. Table Partitioning and File Group Allocation When tables and indexes grow large, partitioning the data will help us split the data in to smaller andmanageable sections. When the data in the table is partitioned, the SQL server can often eliminatesearching the data in the irrelevant partitions and read the data directly from those partitions whichcontains the requested data – this is called as partition elimination.No only select queries, DML statements such as Update, Delete etc. also get considerableperformance improvement while modifying the data from partitioned tables. Note: Partition elimination will work only if the partitioned column is present in the where clause of thequery.  Although the partitioned data can reside in a single File Group, it makes sense to save it in a multiplefile groups to achieve better performance, intelligent partition elimination and boosted IO Performance.Data can be backed up using file group backups separately. Another important benefit of storing the filegroups into multiple file groups is, if the data from any one of the partition is old and if it is consideredonly for Auditing, then those partitions can be marked as Read-only, if the data has to be pulled frommultiple partitioned tables, SQL server might use multiple process to read the data from multiplepartitions. Important :  SQL Server 2012 supports up to 15,000 partitions by default. In earlier versions, thenumber of partitions was limited to 1,000 by default. On x86-based systems, creating a table or index with more than 1000 partitions is possible, but is not supported. Refer: http://msdn.microsoft.com/en-us/library/ms190787.aspx Approach1  All the tenants’ metadata are stored in a DB1 and application data in DB2. This option will be ideal for many applications with fewer loads of users and IO Operations. The maintenance is simpler and costefficient. For start-ups, it is advisable to kick start with this approach and as the business grows, onecan scale-out using the other approaches. Approach2 When a tenant is insisting on complete isolation of their data in a separate database, a separateinstance for that tenant could be created. The metadata of multiple tenants can reside in a commondatabase, but the transaction data will be in a separate instance of database. Approach3 Separate schema for each tenant can be considered for some applications. It gives a greater level of isolation and independence between multiple tenants, but it is easy to maintain as it shares the samedatabase server instance. Approach4 Usage pattern and volume might vary across tenants. In such a scenario, sharing schema / databasebetween two or three tenants and having a separate schema / database for large tenants could beconsidered. Approach 5  A separate database is used for each module. All the tenants share the same database with respect tothat module. This is also called Vertical Partitioning. Approach 6  Database Partitioning based on Tenant IDVarious Types of Data Sharding This approach is a combination of Vertical and Horizontal Partitioning. Data from different applicationModules are grouped and stored in different databases. The data is also partitioned based on TenantID. Data Connection Abstraction and Connection String management In a multi tenant environment where almost all kind of data scaling possibilities are given, there has tobe a strong, comprehensive and centralized approach to manage the multiple connections across theapplication for each tenant. The three important aspects of connection management is1. Security 2. Administration 3. ControlStoring connection strings in a simple plain text format may not be advisable in all situations. Soprovision should be made to encrypt them when necessary. Administration of Connection strings mightnot be an easy task when there are 100s of tenants connecting to multiple databases for each of their 
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks