Documents

Data Warehousing and Business Intelligence.docx

Description
Data Warehousing and Business Intelligence In this section of techtiks, you will find detailed information and must-read articles related to Data Warehousing and Business Intelligence. The target audience ranges from beginners to experts. These articles are written by highly qualified Data Warehouse Engineers. Whenever you see or hear the words Data Warehouse you might think of some large building that has bits of information stored on shelves waiting for someone to retrieve them, perhaps? Let
Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Data Warehousing and Business Intelligence In this section of techtiks, you will find detailed information and must-read articles related to Data Warehousing and Business Intelligence. The target audience ranges from beginners to experts. These articles are written by highly qualified Data Warehouse Engineers. Whenever you see or hear the words Data Warehouse  you might think of some large building that has bits of information stored on shelves waiting for someone to retrieve them, perhaps? Let's think of some traditional warehouse, what it contains? It contains some goods stored in such a way that they are easy to identify and they can  be quickly retrieved. A Data Warehouse also functions in the similar way. Then how the Data Warehouse is different from traditional relational databases? Relational database is similar to Data Warehouse but there are certain defining differences. You will see the differences in following articles.    Datawarehouse Defined     Elements of Data Warehouse     History of Data Warehousing     Dimensional Modeling     Datawarehouse Defined Consider an example where business analyst uses the systems containing operational data (the data that runs the daily transactions of your business). Analysts can use information about, which products were sold in which regions at what time of the year, to look for anomalies or to project future sales. However, there are several problems if analyst accesses operational data directly:    He might not have the expertise to query the operational database. For example, querying IMS databases requires an application program that uses a specialized type of data manipulation language. In general, those programmers who have the expertise to query the operational database have a full-time job in maintaining the database and its applications.    Performance is critical for many operational databases, such as databases for a bank. The system cannot handle users making ad-hoc queries.    The operational data generally is not in the best format to be used for reporting queries Data warehousing solves these problems. In data warehousing, you create stores of informational data data that is extracted from the operational data and then transformed for reporting and decision making. For example, a data warehousing tool might copy all the sales data from the operational database, perform calculations to summarize the data, and write it to a new database. End-users can query the new database (the warehouse) without impacting the operational databases. To summarize    The purpose of data warehouse is to store data consistently across the organization and to make the organizational information accessible.    It is adaptive and resilient source of information. When new data is added to the Data Warehouse, the existing data and technologies are not disrupted. The design of separate data marts that make up the data warehouse must be distributed and incremental. Anything else is a compromise.     The data warehouse not only controls the access to the data, but gives its owners great visibility into the uses and abuses of the data, even after it has left the data warehouse.    Data warehouse is the foundation for decision-making.    Elements of Data Warehouse  Source Systems Typically in any organization the data is stored in various databases, usually divided up by the systems. There may  be data for marketing, sales, payroll, engineering, etc. These systems might be legacy/mainframe systems or relational database systems. Staging Area The data coming from various source systems is first kept in a staging area. The staging area is used to clean, transform, combine, de-duplicate, household, archive, and to prepare source data for use in data warehouse. The data coming from source system is kept as it is in this area. This need not be based on relational terminology. Sometimes managers of the data are comfortable with normalized set of data. In these cases, normalized structure of the data staging storage is certainly acceptable. Also, staging area doesn’t  provide querying/presentation services. Presentation Server Once the data is in staging area, it is cleansed, transformed and then sent to Data warehouse. You may or may not have ODS before transferring data to Data Warehouse. OLAP The data in Data Warehouse has to be easily manipulated in order to answer the business questions from management and other users. This is accomplished by connecting the data to fast and easy-to-use tools known as Online Analytical Processing (OLAP) tools. OLAP tools can be thought of as super high-speed forklifts that have knowledge of the warehouse and the operators built into them in order to allow ordinary people off the street to  jump in and quickly find products by asking English-like questions. Within the OLAP server, data is reorganized to meet the reporting and analysis requirements of the business, including:    Exception reporting    Ad-hoc analysis    Actual vs. budget reporting    Data mining (looking for trends or anomalies in the data) In order to process business queries at high speed, answers to common questions are preprocessed in some OLAP servers, resulting in exceptional query responses at the cost of having an OLAP database that may be several times  bigger than the data warehouse itself.  Data Mart Data mart is a logical subset of complete data warehouse. It is often viewed as the restriction of data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. For example an organization may have a data mart for Sales or Inventory.    History of Data Warehousing  Ralph Kimball Vs. Bill Inmon's Paradigm of Data Warehouse In data warehousing field, we often hear about discussion on whether a person/organization’s philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. Below is the difference between two philosophies: Bill Inmon's paradigm Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.  Ralph Kimball's paradigm Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model. There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse in most enterprises is closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they srcinated as a data mart. Only when more data marts are built later do they evolve into a data warehouse.    Dimensional Modeling  Quick Reference Guide to Dimensional Modeling Dimensional modeling is the design concept used by many data warehouse designers to build their data warehouse. Dimensional model is the underlying data model used by many of the commercial OLAP products available today in the market. Designing a data warehouse is very different from designing an online transaction processing (OLTP) system. In contrast to an OLTP system in which the purpose is to capture high rates of data changes and additions, the purpose of a data warehouse is to organize large amounts of stable data for ease of analysis and retrieval. Because of these differing purposes, there are many considerations in data warehouse design that differ from OLTP database design. In dimensional model, all data is contained in two types of tables called Fact Table and Dimension Table.  Fact Table Each data warehouse or data mart includes one or more fact tables. The fact table captures the data that measures the organizations business operations. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows, sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization. A key characteristic of a fact table is that it contains numerical data (facts) that can be summarized to  provide information about the history of the operation of the organization. Each fact table also includes a multipart index that contains as foreign keys the primary keys of related dimension tables, which contain the attributes of the fact records. Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables. An example of fact table is Sales_Fact table that might contain the information like sale_amount, unit_price, discount, etc. Dimension Table Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. For example, a dimension containing product information would often contain a hierarchy that separates products into categories such as food, drink, and non-consumable items, with each of these categories further subdivided a number of times until the individual product is reached at the lowest level. Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions. For example, a customer dimension table contains data about customers, a product dimension table contains information about products, and a store dimension table contains information about stores. Queries use attributes in dimensions to specify a view into the fact information. For example, a query might use the  product, store, and time dimensions to ask the question What was the cost of non-consumable goods sold in the northeast region in 1999? Subsequent queries might drill down along one or more dimensions to examine more detailed data, such as What was the cost of kitchen products in New York City in the third quarter of 1999? In these examples, the dimension tables are used to specify how a measure (sale_amount) in the fact table is to be summarized. Consider an example of Sales_Fact table and the various attributes that describe this fact are Store, Product, Time and say Sales Person. In this case we will have four dimension tables, viz. Store_Dimension, Product_Dimension, Time_Dimension and Sales_Person_Dimension. Figure 1 You may notice that all of these dimensions contain a Key field. This is called Surrogate Key. This key is substitute for a natural key in dimensions (e.g., in Sales_Person_Dimension, we have natural key as ID). In a data warehouse a surrogate key is a generalization of the natural production key and is one of the basic elements of data warehouse.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks