Heuristic query optimization technique is used to modify the internal representation of a query by using heuristic rules and transformation rules. Heuristic rules are used in the form of a query tree or query graph structure. Optimiser starts with initial query tree and transform it into an equivalent and efficient query tree using transformation rules.
Heuristic Optimization Algorithm : DBMS use heuristic optimization algorithms to improve the efficiency of query by converting initial query tree into an equivalent and optimized query tree. Optimizers utilize transformation rules to optimize the structure of query tree. Following are the steps of heuristic optimization algorithm.
- Step 1. Perform Selection operation as early as possible : By using selection operation at early stages, you can reduce the unwanted number of record or data, to transfer from database to primary memory.
Optimizer use transformation rule 1 to divide selection operations with conjunctive conditions into a cascade of selection operations.
- Step 2. Perform commutativity of selection operation with other operations as early as possible : Optimizer use transformation rule 2, 4, 6, and 9 to move selection operation as far down the tree as possible and keep selection predicates on the same relation together. By keeping selection operation down at tree reduces the unwanted data transfer and by keeping selection predicates together on same relations reduces the number of times of database manipulation to retrieve records from same database table.
- Step 3. Combine the Cartesian Product with subsequent selection operation whose predicates represents a join condition into a JOIN operation : Optimizer uses transformation rule 13 to convert a selection and cartesian product sequence into join. It reduces data transfer. It is always better to transfer only required data from database instead of transferring whole data and then refine it. (Cartesian product combines all data of all the tables mention in query while join operation retrieves only those records from database that satisfy the join condition).
- Step 4. Use Commutativity and Associativity of Binary operations : Optimizer use transformation rules 5, 11, and 12 to execute the most restrictive selection operations first.
It rearranges the leaf nodes of query tree. By using the most restrictive selection operations, the number of records fetched from database reduces and also subsequent operations can be performed on less number of records.
- Step 5. Perform projection operations as early as possible : After performing selection operations, optimizer use transformation rules 3, 4, 7 and 10 to reduce the number of columns of a relation by moving projection operations as far down the tree as possible and keeping projection predicates on the same relation together.
- Step 6. Compute common expressions only once: It is used to identify sub-trees that represent groups of operations that can be executed by a single algorithm.
Consider the query below in SQL and transformation of its initial query tree into an optimal query tree.
SELECT Emp_Name
FROM Employee e, Department d, Project p
WHERE p.Name = ‘LUXMI PUB.’
AND d.Proj_ID = p.Proj_ID
AND e.Dept_ID= d.Dept_ID
AND e.Age > 35
This query needs to display names of all employees working for project “LUXMI PUB.” and having age more than 35 years.
Figure 11.10 shows the initial query tree for the given SQL query. If the tree is executed directly then it results in the Cartesian product of entire Employee, Department, and Project table but in reality, the query needed only one record from relation Project and only the employee records for those whose age is greater than 35 years.
- We can improve the performance by first applying selection operations to reduce the number of records that appear in Casterian Figure 11.11 shows the improved query tree.
- The query tree can be further improved by applying more restrictive selection operation. So, switch the positions of relations Project and Employee as you know that in a single project it may be more than one Figure 11.12 shows the improved query tree.
- A further improvement can be done by replacing Cartesian product operations by Join operations with a join condition as shown in Figure 13.
- Further improvement can be done in query tree by keeping only required attributes (columns) of relations by applying projection operations as early as possible in the query Optimizer keep the attributes required to display, and the attributes needed by the subsequent operation in the intermediate relations. Modified query tree is shown in Figure 11.14.
Note: The SELECT clause in SQL is equivalent to projection operation and WHERE clause in SQL is equivalent to selection operation in relational algebra.
Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)