Basics of Query Processing

Query Processing : Query Processing is a procedure of converting a query written in high- level language (Ex. SQL, QBE) into a correct and efficient execution plan expressed in low- level language, which is used for data manipulation.

Query Processor : Query processor is responsible for generating execution plan.

Execution Plan : Query processing is a stepwise process. Before retrieving or updating data in database, a query goes through a series of query compilation steps. These steps are known as execution plan.

The success of a query language also depends upon its query processor i.e., how much efficient execution plan it can create? The better execution plan leads to low time and cost.

In query processing, the first phase is transformation in which parser first checks the syntax of query and also checks the relations and attributes used in the query that are defined in the database. After checking the syntax and verifying the relations, query is transformed into equivalent expression that are more efficient to execute. Transformation, depends upon various factors like existence of certain database structures, presence of different indexes, file is sorted or not, cost of transformation, physical characteristics of data etc. After transformation of query, transformed query is evaluated by using number of strategies known as access plans. While generating access plans, factors like physical properties of data and storage are taken into account and the optimal access plan is executed. The next step is to validate the user privileges and ensure that the query does not disobey the relevant integrity constraints. Finally, execution plan is executed to generate the result.

1. General Strategy for Query Processing

The general  strategy for  query processing  is as  follows:

  • Representation of query : Query written by user cannot be processed directly by Query processor first checks the syntax and existence of relations and their attributes in database. After validations, query processor transform it into equivalent and more efficient expression for example query will be converted into a standard internal format that parser can manipulate. Parser also adds some additional predicates to the query to enforce security. Internal form may be relational algebra, relational calculus, any low-level language, operator graphs etc.
  • Operator graphs : Operator graphs are used to represent It gives the sequence of operations that can be performed. It is easy to understand the query represented by operator graphs. It is useful to determine redundancy in query expressions, result of transformation, simplify the view etc.
  • Response time and Data characteristics consideration : Data characteristics like length of records, expected sizes of both intermediate and final results, size of relations , are also considered for optimizing the query. In addition to this overall response time is also determined.

2. Steps in Query Processing

Various steps in query processing are shown in Figure 11.1. Suppose that user inputs a query in general query language say QBE, then it is first converted into high-level query language say SQL etc.  Other steps  in  query processing  are  discussed  below in  detail:

  • Syntax Analysis : Query in high-level language is parsed into tokens and tokens are analyzed for any syntax Order of tokens are also maintained to make sure that all the rules of language grammars are followed. In case of any error, query is rejected and an error code with explanation for rejection is returned to the user. (Only syntax is checked in this step).
  • Query Decomposition : In this step, query is decomposed into query blocks which are the low-level It starts with the high-level query that is transformed into low- level operations and checks whether that query is syntactically and semantically correct. For example, a SQL query is decomposed into blocks like Select block, From block, Where block etc. Various stages in query decomposition are shown in Figure 11.2.

  • Query Analysis : In the query analysis stage, programming language compiler checks that the query is lexically and syntactically A syntactically correct query is analyzed using system catalogues to verify the existence of relations and attributes used in query. After analysis a correct query is converted into some internal representation, which is more  efficient  for processing.

The type specification of the query qualifier and result is also checked at this stage. The internal  representation may  be, query  tree or query  graph.

Query tree notation : A Typical internal representation of query is query tree. It is also known as relational algebra tree. A query tree is constructed using tree data structure that corresponds to the relational algebra expression. Main components of query tree  are:

    • Root of tree–represents result of query.
    • Leaf nodes–represent input relations of the query.
    • Internal nodes–represent intermediate relation that is the output of applying an operation in the algebra.
    • The sequence of operations is directed from leaves to the root node.

For example,

Query graph notation : Graph data structure is also used for internal representation of query.  In graphs:

    • Relation nodes–represent relations by single circle.
    • Constant nodes–represent constant values by double circle.
    • Edges–represent relation and join conditions.
    • Square brackets–represent attributes retrieved from each relation.

  • Query normalization : After query analysis, it is normalized to remove any redundancy. In this phase query is converted into normalized form that can be easily A set of equivalency rules are applied to query to simplify the projection and selection operations to avoid redundancy. Query can be converted into one of the following two normal forms :
    • Conjunctive normal form : It is a sequence of conjuncts that are connected with ‘AND’ A conjunct consists of one or more terms connected with ‘OR’ operator. A conjuctive selection consists only those tuples that satisfy all conjuncts.

Example.(Emp_Job = ‘Analyst’ ∨ salary < 50000) ∧ (Hire_Date > 1–1–2000)

    • Disjunctive normal forms : It is a sequence of disjuncts that are connected with ‘OR’ A disjunct consists of one or more terms connected with ‘AND’ operator. A disjunctive selection contains those tuples that satisfy anyone of the disjunct.

Example.(Emp_Job= ‘Analyst’ ∧ salary < 5000) ∨ (Hire_Date > 1–1–2000)

Disjunctive normal form is more useful as it allows the query to break into a series of independent sub-queries linked by union.

  • Semantic analyzer : The semantic analyzer performs the following tasks :
    • It helps in reducing the number of predicates.
    • It rejects contradictory normalized forms.
    • In case of missing join specification, components of query do not contribute to generation of It identifies these  queries and  rejects them.
    • It makes sure that each object in query is referenced correctly according to its data type.
  • Query simplifler : The major tasks of query simplifier are as follows :
    • It eliminates common sub-expressions.
    • It eliminates redundant qualification.
    • It introduces integrity constraints, view definitions into the query graph representation.
    • It eliminates query that voids any integrity constraint without accessing the database.
    • It transforms sub-graphs into semantically equivalent and more efficient form.
    • It deals with user access rights.

Idempotence rules of Boolean Algebra are applied to get final form of simplification.

  • Query Restructuring : At the final stage of query decomposition, transformation rules are applied to restructure the query to give a more efficient
    • Query Optimization : The aim of the query optimization step is to choose the best possible query execution plan with minimum resources required to execute that plan. Query optimization is discussed  in  detail in  section  11.3.
    • Execution Plan : Execution plan is the basic algorithm used for each operation in the Execution plans are classified into following Four types : (a) Left-deep tree query execution plane, (b) Right-deep tree query execution plan, (c) Linear tree execution plan, (d) Bushy execution plan.
  • Left-deep tree query execution plan : In left-deep tree query execution plan, development of plan starts with a single relation and successively adding a operation involving a single relation until the query is For example, Only the left hand side of a join is allowed to participate in result from a previous join and hence named left-deep tree. It is shown in  Figure  11.5.

Advantages : The main  advantages of left-deep  tree query execution plan are

    • It reduces search space.
    • Query optimiser is based on dynamic programming techniques.
    • It is convenient for pipelined evaluation as only one input to each join is pipelined.

Disadvantage : The disadvantages of  left-deep tree query execution  plan are

    • Reduction in search space leads  to  miss some  lower  cost execution  strategies.
  • Right-deep tree query execution plan : It is almost same as left-deep query execution plan with the only difference that only the right hand side of a join is allowed to participate in result from a previous join and hence named right-deep It is applicable on applications having a large main memory. It is shown in Figure 11.6.

  • Linear tree execution plan : The combination of left-deep and right-deep execution plans with a restriction that the relation on one side of each operator is always a base relation is known as linear It is shown  in Figure 11.7.

  • Bushy execution plan : Bushy execution plan is the most general type of execution More than one relation can participate in intermediate results. It is shown in Figure 11.8.

The main advantage of bushy execution plan is the flexibility provided by it in choosing the best execution plan by increasing search space but this flexibility may leads to  considerably  increase  the  search space.

  • Query Code Generator : After selecting the best possible execution plan query is converted into low-level language so that it can be taken as input by runtime database process.
  • Runtime Database Processor : It deals directly with main database and do the necessary operation mentioned in query and returns the result  to user.

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *