Testing Database Applications

Testing Database Applications MTech Seminar Report by Rishi Raj Gupta Roll No: under the guidance of Prof. S. Sudarshan Computer Science and Engineering IIT-Bombay Department of Computer Science
of 24
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Testing Database Applications MTech Seminar Report by Rishi Raj Gupta Roll No: under the guidance of Prof. S. Sudarshan Computer Science and Engineering IIT-Bombay Department of Computer Science and Engineering Indian Institute of Technology, Bombay Mumbai Acknowledgments I would like to thank my guide, Prof. S. Sudarshan for the consistent motivation and directions he has fed into my work. Rishi Raj Gupta MTech-I CSE, IIT Bombay Abstract Database applications play an important role in nearly every organization, yet little has been done on testing of database applications. They are becoming increasingly complex and are subject to constant change. They are often designed to be executed concurrently by many clients. Testing of database application hence is of utmost importance to avoid any future errors encountered in the application, since a single faults in database application can result in unrecoverable data loss and which may effect working of any organization. Many tools and frameworks for performing testing of database applications, such as AGENDA, has been proposed to populate the test database and generate test cases which checks the correctness of application. They check database applications for consistency constraints and transactions concurrency. In this report we first present various such procedures and strategies, proposed in the literatures for testing database applications. Strategies for testing of embedded SQL queries within imperative language are presented next. Finally we present strategies for performing efficient regression tests for testing database applications and parallel execution of test runs for application system. Contents 1 Introduction 1 2 Database Testing Tool Set-AGENDA Overview of Architecture Test Database Generation Testing Database Transactions and Concurrency Database generation Automatic Generation of DB instances Data Generation Language (DGL) Overview of Data Types and Iterators DGL programs and Expressions Annotated Schemas Using DGL Testing SQL Queries Static Checking of Generated Queries Testing of SQL Evaluation Engine by RAGS Efficient Regression Test of Database Database Applications Regression Tests Regression Test framework and Overview Centralized Scheduling Strategies Discussion and Results Parallel Execution of Test Runs Conclusions and Future Work 23 i Chapter 1 Introduction Database application programs play a central role in operation of almost every modern organization. It is essential that they are throughly tested for all conditions. Testing Database applications hasn t gone through much work. We have many testing tools, such as JUnit, for testing programs, but thus far only a limited amount of work has been done on testing database application. Regression testing tools present currently are not optimized for database testing due to they been designed for stateless applications. Regression testing is a type of testing which seeks to uncover regression bugs. The purpose of regression tests is to detect changes in the behavior of an application after the application or its configuration has been changed. Testing of Database applications requires various tasks : 1. Database schema parsing for extracting information. 2. Test database generation. 3. Test cases generation. 4. Validation of state and output of test cases. As per the report generated by the test tool, the tester may check the consistency and correctness of database applications. Various such tools for database testing has been proposed so far. Generation of test database can also be done in different ways. Depending upon the technique and tool used, generation of test database provide different amount of coverage of test database. Since Real time data is hard to gather and it even may not cover all the test cases, therefore different test database generation strategies for generating synthetic data has been proposed. David Chays has proposed an AGENDA tool set [2], designed for testing relational database applications. Firstly in this report, we will cover the prototype of the tool and the extensions of it to test transactions [11] and concurrency [3]. The tool set uses a parser to extract various schema information from the database, populates the test database with either live or synthetic data, generates test cases and executes them, observes the database state before and after test execution and validates the state and output of the system to check the database application program. The AGENDA tool finds and generate interesting and legal schedules to populate the database. AGENDA is also extended to test transactions with multiple queries. In reality, transactions doesn t work serially, they 1 work concurrently. In order to have feasibility of the tool, it must be capable of identifying concurrent transactions and must check for concurrency faults. To reveal the concurrency fault, a dataflow analysis technique for identifying schedules of transaction execution was proposed in [11] which is covered in this report. Secondly, in this report we will look at techniques to test the SQL Queries. Database application programs typically contain statements written in an imperative programming language with embedded data manipulation commands, such as SQL. There is a need to have a good test coverage for database systems. Typical SQL test libraries contain a large number of statements and it takes time to compose them manually. These test libraries cover an important, but tiny, fraction of the SQL input domain. A system called RAGS (Random Generation of SQL) [9] was build to explore automated testing. This system helps in generation of SQL statements stochastically (or randomly) which provides the speed as well as wider coverage of the input domain. Thirdly, we will consider a fault-based approach for the generation of database instances to support white-box testing of embedded SQL program. SQL embedded in imperative language has to be dealt in order to have a good database test cases or instances genrated. It generates the database instances that respect the semantics of SQL statements, generating a set of constraints. These constraints collectively represent a property against which the program is tested. Database instances for program testing can be derived by solving the set of constraints using existing constraint solvers. Data-intensive applications often dynamically construct database query strings and execute them. The underlying SQL query has also to be tested so as to avoid any runtime error encountering by application program due to SQL. A technique for static checking of such dynamically generated queries in database application proposed by Carl Gould [1] is also covered. Finally, we will present heuristics for centralized scheduling strategies to perfrom efficient regression tests [4] and parallel execution of test runs [5] for database application systems, such as Shared nothing (SN) and Shared Database (SDB). Database applications are becoming increasingly complex, and most databases are subject to constant change. Due to these, the application and its configuration must be frequently changed. Unfortunately, such a change is costly, mostly due to carrying out of the tests in order to ensure the integrity of the application. Testing of database application requires a great deal of manual work. Common methods of regression testing include re-running previously run tests and checking whether previously-fixed faults have reemerged. The regression test tools can store the request and responses and some heuristics can be applied to schedule the test runs for the efficient regression tests [4]. By controlling the state of the database during testing and by ordering the test runs efficiently, the time for regression test can be optimized. This report discusses issues that arise in testing database applications. Chapter 2 presents an overview of AGENDA tool set and the handling of data generation and concurrency issue. Chapter 3 discusses two different techniques for database generation. Chapter 4 focuses on role SQL Queries can have for testing database applications. Chapter 5 presents framework for efficient regression testing and parallel execution of test runs using some heuristics. Chapter 6 concludes with a summary of the report and the related future work. 2 Chapter 2 Database Testing Tool Set-AGENDA The AGENDA system [2] is a comprehensive tool set that semi-automatically generates a database and test cases, and assists the tester in checking the results. AGENDA has been devised to satisfy various integrity constraints while generating test database and they are even extended to handle transaction and concurrency issues effectively. They deal with the issue whether the application program is behaving as specified. In this report, we will see what are the different components of AGENDA tool and how it handles different issues. 2.1 Overview of Architecture AGENDA, as shown in Figure 2.1, takes as input the database schema of the database on which the application runs, the application source code and the sample values file. A sample-values file contains the suggested values of the attributes, partitioned into groups, called data groups. The user interactively selects test heuristics and provides information about expected behavior of test cases. Using this information, AGENDA populates the database, generates inputs to application, executes the application on those inputs and checks some aspects of correctness of the resulting database state and the application output. The live data generally do not reflect a sufficient wide variety of possible situations that could occur, so there is a need to generate synthetic data for database testing. AGENDA consists of five interacting components that operate with guidance from the tester. 1. Agenda Parser: The core of this parser is SQL Parser. Given a schema, the PostgreSQL parser creates an Abstract Syntax Tree that contains relevant information about the tables, attributes, and constraints. However, it is also possible to use different SQL DDL syntactic constructs to express the same underlying information about a table. It extracts the information from database schema, application queries, tester-supplied sample-values files, and makes this information available to other four components as shown in Figure 2.1. It does this by creating an internal database, which we refer to as AgendaDB, to store extracted information. The AgendaDB is used and/or modified by the remaining four components. 3 Figure 2.1: Architecture of AGENDA Tool Set. 2. State Generator: This uses the database schema along with information from the tester s sample-value files indicating useful values for attributes and populates the database tables with data satisfying the constraints. It retrieves the information from the AgendaDB and generates an initial database state for the application, which we refer to as the ApplicationDB. 3. Input Generator generates input data to be supplied to the application. The data are created by using information that is generated by the Agenda parser and State generator components, along with the information derived from parsing the SQL statements in the application program and information that is useful for checking the test results. Using the AgendaDB, along with the tester s choice of heuristics, the Input Generator instantiates the input parameters of the application with actual values, thus generating test inputs. 4. State Validator: It investigates the changes in the state of the application database during execution of a test. It logs the information in the application tables and checks the state change. 5. Output Validator captures the applications outputs and checks them against the query preconditions and postconditions that have been generated by the tool or supplied by the tester. The precondition could be a join condition supplied by the user, and the postcondition could the condition satisfying the constraint. 4 Some of the information stored in Agenda database is also stored in the DBMS s internal catalog tables. However, building and then querying a separate Agenda database allows us to decouple the remaining components from the details. This allows AGENDA to be ported to a different DMBS by changing only the Agenda Parser. 2.2 Test Database Generation The relational database schema is a set of relation schema s along with a set of integrity constraints. There are several types of constraints such as Domain constraints, Uniqueness constraints, Referential integrity constraints, Semantic integrity constraints, not-null constraints, etc. The AGENDA takes care of these while generating the test database and test cases. The AGENDA tool has also been extended to test the transactions with multiple queries. Concurrency related faults can also be handled by AGENDA as presented in [3]. AGENDA can take care of constraints present in the database tables. It extracts information about uniqueness constraints, referential integrity constraints and not-null constraints. It also extracts limited information from semantic constraints, namely boundary values from sufficient simple boolean expressions. Chays [2] initially considers transactions that consist of a single parameterized query, but later extends it to transactions consisting of multiple queries. A host variable in an SQL query represents either an input parameter or an output parameter. Generating a test case involves instantiating each input parameter with an appropriate value. Validating a test case involves examining the output of a select query and/or examining the resulting database state in order to check that it changed or did not changed appropriately. In general, an SQL query can retrieve many tuples. Typically, the host language program processes the retrieved tuples one at a time via a cursor. A cursor can be thought of as a pointer to a single tuple(row) from the result of a query. The goal is the selection of consistent (that doesn t violate integrity constraint) and comprehensive (includes different situations to increase likelihood of increasing fault) data. Consider the following example: SELECT ename, bonus INTO : out name, : out bonus FROM emp WHERE ( (emp.deptno = : in deptno) and (salary AND salary ) ); Here host variables in the INTO clause are output parameters, wheres as those present in WHERE clause are input parameters. When the above SELECT query is parsed, Agenda parser stores in the Agenda database the host variables and the corresponding attributes of the relation associated with them. Now depending upon the direct or indirect association between them, the Input Generator initializes the value for the corresponding host variables. Boundary values can also be easily identified. Agenda Parser extracts the boundary values and along with the attribute and stores this too in the Agenda database. If tester selects boundary values also as a heuristic, to guide the data generation, then these stored values are used. Constraints are handled as follows: a) Uniqueness Constraints is handled by looking at the frequency field for a attribute, in the AgendaDB table, to avoid selecting the same value more than once. 5 b) Referential Integrity Constraint: When selecting a value for attribute A in table T, where this attribute references attribute A in table T, the State generator refers to the value of the records associated with the attribute records for attribute A in table T and selects a value that has already been used. The implementation uses a topological sort to impose an ordering on the application table names, stored in the AgendaDB. c)not Null Constraints Each attribute that doesn t have a not-null constraint is considered a candidate for NULL by the state generator. Thus, a NULL group is implicitly added for this attribute in the AgendaDB. By this, the State generator knows that it can chose NULL when generating a value for such an attribute, and Input generator knows that it can instantiate an input parameter with a NULL value for this attribute. 2.3 Testing Database Transactions and Concurrency Deng and Chays [11] focus on testing database transactions to check whether, when run in isolation, they are consistent with their requirement. AGENDA s approach to generating and executing tests is based on integrity constraints, specifically state constraints (a predicate over database states) and transition constraints (which involves database states before and after execution of a transaction). Transaction consistency has two aspects: when run in isolation, it should remain consistent, and the relation between old and new state should satisfy the requirements of the transaction s specification. To check a state constraint, which is not enforced by the DBMS, AGENDA creates temporary tables to store the relevant data. It translates such constraints to the constraints on temporary tables which can be enforced by the DBMS. The tester can specify preconditions and postconditions to test for the global consistency constraint. Temporary tables are created to deal with joining relevant attributes from different tables and to replace calls to aggregate functions by single attributes representing the aggregate returned. For example, a reference to SUM(X) in a constraint gives rise to a temporary attribute X SUM. The constraint to be checked is translated into a constraint on a temporary table. To populate the temporary table, constraints are added to temporary table and contents are copied into it, with the constraint checked automatically after each insertion. Constraint violations are reported to AGENDA, indicating that the transaction violated a global state constraint. Current version AGENDA-0.1 can check relatively simple transition constraints that involve a single table. This tool modifies the schema so that for each table, there is an additional log table that records all modifications made to the table on execution of application program. Appropriate log tables are filled in response to each insert, modify or delete operation using a trigger. The tester supplies simple constraints involving old and new values in a row and AGENDA translates these into constraints on log tables. Logging is based on temporary tables with attributes representing old and new values of attributes and aggregates from all the tables involved in the constraint. The check constraint on that tables comes from the postcondition of the transition constraint. The SELECT statement selects the relevant attributes from the application/log tables and stores its result into a cursor, which then fills the temporary table, with the constraints being automatically checked for each row. Initially, all global consistency constraints are validated for the initial database. After a 6 transition commits, AGENDA checks the log tables for all the application tables associated with each global constraint. If they are all empty, nothing relevant to that constraint has changed, else the global constraint is checked. A faulty updating of tables by a transition cant be exposed only by transition check, but if this violation of transition consistency constraint causes a violation of a state constraint, AGENDA will find the problem. State checking and transition checking complement each other to check consistency efficiently. Testing Database Concurrency Deng, Franki and Chen in their work [3] present a dataflow analysis technique for identifying schedules of transaction execution to reveal concurrency faults potentially due to offline concurrency problem. Although the DBMS employs sophisticated mechanisms to assure that transactions satisfy the ACID properties - Atomicity, Consistency, Isolation, and Durability - an application can still have concurrency related faults if the application programmer erroneously placed queries in separate transactions when they should have been executed as a unit. In their work [3], auth
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks