DATA WAREHOUSE AND ITS APPLICATIONS IN AGRICULTURE Anil Rai I.A.S.R.I., Library Avenue, New Delhi Introduction A Data warehouse is a repository of integrated information, available
of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
DATA WAREHOUSE AND ITS APPLICATIONS IN AGRICULTURE Anil Rai I.A.S.R.I., Library Avenue, New Delhi Introduction A Data warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to run queries over data that originally came from different sources. In other words Data warehouse is a database that is used to hold data for reporting and analysis. Goals of data warehousing To facilitate reporting as well as analysis Maintain an organizations historical information Be an adaptive and resilient source of information Be the foundation for decision making Data warehouse Architecture Data warehouse Architecture comprises of Operational source systems A data staging area One or more conformed data marts A data warehouse database Operational source systems Operational source systems are developed to capture and process original business transactions. These systems are designed for data entry, not for reporting, but it is from here the data in datawarehouse gets populated Data staging area Data staging area is where the raw operational data is Extracted, cleaned, Transformed and combined so that it can be reported on and queried by users. This area lies between the operational source systems and the user database and is typically not accessible to users. Data staging is a major process that includes the following sub procedures Extraction The extract step is the first step of getting data into the data warehouse environment. Extracting means reading and understanding the source data, and copying the pas that are needed to the data staging for further work. Transformation Once the data is extracted into the data staging area, there are many transformation steps, including 1. Cleaning the data by correcting misspellings, resolving domain conflicts, dealing with missing data elements, and parsing into standard formats. 2. Purging selected fields from the legacy data that are not useful for datawarehouse. 3. Combining data sources by matching exactly on key values or by performing fuzzy matches on non-key attributes. 4. Creating surrogate keys for each dimension record for in order to avoid dependency on legacy defined keys, where the surrogate key generation process enforces referential integrity between the dimension tables and fact tables. 5. Building the aggregates for boosting the performance of common queries. Loading and indexing At the end of transformation process, the data is in the form of load record images. Loading in the data warehouse environment usually takes the form of replicating the dimensional tables and fact tables and presenting these tables and presenting these tables to bulk load ing facilitates of each recipient data mart. Bulk loading is a very important capability that is to be contrasted with record-at-a time loading, which is far slower. The target data mart must then index the newly arrived data for query performance. Data mart Data mart is a logical subset of an enterprise-wide data warehouse.for example a data warehouse for a retail chain is constructed incrementally from individual, conformed data marts dealing with separate subject areas such as product sales. Dimensional data marts are organized by subject area (such as sales, finance, and marketing) and coordinated by data category, (such as customer, product, and location). These flexible information stores allows data structures to respond to business changes product line additions, new staff responsibilities, mergers, consolidations, and acquisitions. Data warehouse database A data warehouse database contains the data that is organized and stored specifically for direct user queries and differs from an OLTP database in that it is designed primarily for reads not writes An OLAP application is a system designed for few but complex (read only) request. An OLTP application is a system designed for many but simple concurrent (and updating) requests OLAP vs OLTP OLTP (Online Transactional Processing) OLTP servers handle mission-critical production data accessed through simple queries Usually handles queries of an automated nature OLTP applications consist of a large number of relatively simple transactions. VI-176 Most often contains data organised on the basis of logical relations between normalised tables OLAP (Online Analytical Processing) OLAP servers handle management-critical data accessed through an iterative analytical investigation Usually handles queries of an ad-hoc nature supports more complex and demanding transactions contains logically organised data in multiple dimensions Differences between Data warehouse and Data mart Data warehouse Data mart 1. It is a multi-subject information store. 2. It is 100 s of giga bytes in size 3. It is difficult to build 1. It is single subject data warehouse 2. Size is less than 100 giga bytes 3. It is difficult to build Differences between Data warehouse and Data mart Operational Systems Data Warehousing Query Predefined Ad hoc Amount of information involved in queries Time horizon of required information Information level Few Up-to-date Detailed Few-Much Historical and upto-date Detailed and summarized Multidimensional data No Yes CPU use All day long At maximum or not used VI-177 Warehouse schema design Dimensional modeling is a term used to refer a set of data modeling techniques that have gained popularity and acceptance for data warehouse implementation. Dimensional modeling is one of the key techniques in data warehousing. Two types of tables are used in dimensional modeling: Fact tables and dimensional tables Fact tables: These are used to record actual facts and measures in the business. Facts are numeric data items that are of interest to the business. Examples: - telecommunication length of cell in minutes, average number of cells. Dimensional tables:- Dimensional tables establish the context of the facts Dimensional tables store fields that describe the facts. Example:- - telecommunication- call origin,call destination. A schema is a fact tables plus its related dimensional table. Star schema One fact table De-normalized dimension tables One column per level/attribute Simple and easy overview - ease-of-use Relatively flexible Fact table is normalized Dimension tables often relatively small Recognized by many RDBMSes - good Performance Hierarchies are hidden in the columns VI-178 Snowflake schemas Dimensions are normalized One dimension table per level Each dimension table has integer key, level name, and one column per attribute Hierarchies are made explicit/visible Very flexible Dimension tables use less space Harder to use due to many joins Worse performance How A Data Warehouse Is Different from other it Projects A data warehouse project is not a package implementation project. A data warehouse project requires a number of tools and software utilities that are available from multiple vendors. At present there is still no single suite of tools that can automate the entire data warehouse effort. A data warehouse never stops evolving; it changes with business. Unlike OL TP systems that are subject only to changes to the decisional informational requirements of decision makers i.e. it is subject to any changes in the business context of the enterprise. Data warehouses are huge. A pilot data warehouse can easily be more than 10 gigabytes in size. A data warehouse in production for more than an year can easily reach 1 terabyte, depending on granularity &v volume of data. Databases of this size require different debase optimization & tuning techniques. VI-179 Data warehouse implementation The data warehouse implementation team builds or extends an existing warehouse schema based on the final logical schema design produced during planning. The team also builds the warehouse subsystems that ensure a steady, regular flow of clean data from the operational systems into the data warehouse. Other team members insta II and configure the selected front-end tools to provide users with access to warehouse data. An implementation project should be scoped to last between three to six months. Once the warehouse has been deployed, the day to day warehouse management, maintenance, and optimization tasks begin. Some members of the implementation team may be asked to stay on and assist with the maintenance activities to ensure continuity. The other members of the project team may be asked to start planning the next warehouse rollout or may be released to work on other projects. The tasks performed during a warehouse implementation include- Acquire and set up development environment. Obtain copies of operational tables. Finalize physical warehouse schema design. Build or configure extraction and transformation subsystems. Build or configure data quality subsystems. Build warehouse load subsystem. Set up data warehouse schema. Set up data warehouse metadata. Set up data access and retrieval tools. Conduct user training, testing & acceptance. VI-180 Integrated National Agricultural Resources Information System This NATP Project was taken up as a sub-project under National Agricultural Technology Project (NATP). The mission set for this project was to design and development a flexible Central Data Warehouse (CDW) of agricultural resources of the country at IASRI, New Delhi (lead center) and databases on different subjects at respective co-operating centers. The target users of information systems and decision support system developed under this project are (i) Research Managers (ii) Research Scientists and (iii) General Users. In this project a state of art Central Data Warehouse (CDW) of agricultural resources of the country has been developed at IASRI, New Delhi. This is probably the first attempt of data warehousing of agricultural resources in the world. This provides systematic and periodic information to research scientists, planners, decision makers and developmental agencies in the form of On-line Analytical Processing (OLAP) decision support system. The above project has been implemented with active collaboration and support from 13 other ICAR institutions, namely NBSSLUP Nagpur (for soil resources), CRIDA Hyderabad (for agro-meteorology), PDCSR Modipuram (for crops and cropping systems), NBAGR Karnal (for livestock resources), NBFGR Lucknow (for fish resources), NBPGR New Delhi (for plant genetic resources), NCAP New Delhi (for socio-economic resources), CIAE Bhopal (for agricultural implements and machinery), CPCRI Kasargod (for plantation crops), IISR Calicut (for spices crops), ICAR Research Complex for Eastern Region Patna (for water resources), NRC-AF Jhansi (for agro forestry) and IIHR Bangalore (for horticultural crops). In all 59 databases on agricultural technologies generated by council, research projects in operation and related agricultural statistics from published official sources at least from VI-181 the year 1990 onwards at the district level were integrated into this information system. Subject-wise data marts were created; multi-dimensional data cubes have been developed and published on Internet/Intranet. The validation checks have been implemented wherever possible. The information of this data warehouse are available to user in the form of decision support system in which the all the flexibility of the presentation of the information, its on line analysis including graphic is inbuilt in to the system. The system also provides facility of spatial analysis of the data through web using functionalities of Geographic Information System (GIS).Apart from this, subject wise information system has been developed for the general users. The user of this system has the access of subject wise dynamic reports through web. The facilities of data mining and generation of ad-hoc querying were also extended to limited users. Therefore, the dissemination of information from this data warehouse for different categories of users are through web browser with proper authentication of the users. The web site of the project is already launched ( and the multidimensional cubes, dynamic reports, GIS maps and information systems are already available to the users. This project is viewed to strengthen the information system conceptualized by ICAR. Other agencies, in particular, the planning portfolio, are eagerly waiting for such a decision support system. Based on the interaction among the basic resources like soil, water, climate, animal and vegetation that form the prime components of the production system this data warehouse will help in determining the carrying capacity of the region. VI-182 The project aims at giving suitable opportunity on multi-disciplinary mode through enhanced linkages among research institutes and other development agencies by providing first hand information on problems and potential in production systems. This data warehouse may be intensively used with an ultimate aim of enhancing better quality of life of the farming community and society at large References Humpshires, Hawkins.(1999). Data warehousing Architecture and implementation,prentice Hall, New Jersey Ralph Kimball.(1998).The Data warehouse lifecycle tool kit, Wiley Computer Publishing, New York. / / / / / / / VI-183
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks