Programmatic SQL Techniques

SQL is a language and can be used programmatically, but it would be incorrect to call SQL a programming language. SQL lacks even the most primitive features of real programming languages. It has no provision for declaring variables, no GOTO statement, no IF statement for testing conditions, no FOR, DO, or WHILE statements to construct loops, no block structure, and so on. SQL is a database sublanguage that handles special- purpose database management tasks. To write a program that accesses a database, you must start with a conventional programming language, such as COBOL, PL/I, FORTRAN, Pascal, or C and then add SQL to the program.

The initial ANSI/ISO SQL standard was concerned exclusively with this programmatic use of SQL. In fact, the standard did not even include the interactive SELECT statement described in Chapters 6 through 9. It only specified the programmatic SELECT statement described later in the section “Data Retrieval in Embedded SQL.” The SQL2 standard, published in 1992, expanded its focus to include interactive SQL (called direct invocation of SQL in the standard) and more advanced forms of programmatic SQL (the dynamic SQL capabilities described in Chapter 20).

Commercial SQL database vendors offer two basic techniques for using SQL within an application program:

  • Embedded SQL. In this approach, SQL statements are embedded directly into the program’s source code, intermixed with the other programming language statements. Special embedded SQL statements are used to retrieve data into the program. A special SQL precompiler accepts the combined source code and, along with other programming tools, converts it into an executable program.
  • Application program interface. In this approach, the program communicates with the DBMS through a set of function calls called an application program interface, or API. The program passes SQL statements to the DBMS through the API calls and uses API calls to retrieve query results. This approach does not require a special precompiler.

The initial IBM SQL products used an embedded SQL approach, and most commercial SQL products adopted it in the 1980s. The original ANSI/ISO SQL standard specified only an awkward module language for programmatic SQL, but commercial SQL products continued to follow the IBM de facto standard. In 1989, the ANSI/ISO standard was extended to include a definition of how to embed SQL statements within the Ada, C, COBOL, FORTRAN, Pascal, and PL/I programming languages, this time following the IBM approach. The SQL2 standard continued this specification.

In parallel with this evolution of embedded SQL, several DBMS vendors who were focused on minicomputer systems introduced callable database APIs in the 1980s. When the Sybase DBMS was introduced, it offered only a callable API. Microsoft’s SQL Server, derived from the Sybase DBMS, also used the API approach exclusively. Soon after the debut of SQL Server, Microsoft introduced Open Database Connectivity (ODBC), another callable API. ODBC is roughly based on the SQL Server API, but with the additional goals of being database independent and permitting concurrent access to two or more different DBMS brands through a common API.

More recently, Java Database Connectivity (JDBC) has emerged as an important API for accessing a relational database from within programs written in Java. With the growing popularity of callable APIs, the callable and embedded approaches are both in active use today. In general, programmers using older languages, such as COBOL and Assembler, will tend to favor the embedded SQL approach. Programmers using newer languages, such as C++ and Java, will tend to favor the callable API approach.

The following table summarizes the programmatic interfaces offered by some of the leading SQL-based DBMS products:

The basic techniques of embedded SQL, called static SQL, are described in this chapter. Some advanced features of embedded SQL, called dynamic SQL, are discussed in Chapter 20. Callable SQL APIs, including the Sybase/SQL Server API, ODBC, and JDBC, are discussed in Chapter 21.

1. DBMS Statement Processing

To understand any of the programmatic SQL techniques, it helps to understand a little bit more about how the DBMS processes SQL statements. To process a SQL statement, the DBMS goes through a series of five steps, shown in Figure 17-1:

  1. The DBMS begins by parsing the SQL statement. It breaks the statement up into individual words, makes sure that the statement has a valid verb, legal clauses, and so on. Syntax errors and misspellings can be detected in this step.
  2. The DBMS validates the statement. It checks the statement against the system catalog. Do all the tables named in the statement exist in the database? Do all of the columns exist, and are the column names unambiguous? Does the user have the required privileges to execute the statement? Semantic errors are detected in this step.
  3. The DBMS optimizes the statement. It explores various ways to carry out the statement. Can an index be used to speed a search? Should the DBMS first apply a search condition to Table A and then join it to Table B, or should it begin with the join and use the search condition afterward? Can a sequential search through a table be avoided or reduced to a subset of the table? After exploring alternatives, the DBMS chooses one of them.
  4. The DBMS then generates an application plan for the statement. The application plan is a binary representation of the steps that are required to carry out the statement; it is the DBMS equivalent of executable code.
  5. Finally, the DBMS carries out the statement by executing the application plan.

Note that the steps in Figure 17-1 vary in the amount of database access they require and the amount of CPU time they take. Parsing a SQL statement does not require access to the database and typically can be done very quickly. Optimization, on the other hand, is a very CPU-intensive process and requires access to the database’s system catalog. For a complex, multitable query, the optimizer may explore more than a dozen different ways of carrying out the query. However, the cost in computer processing time of doing the query the wrong way is usually so high compared to the cost of doing it the right way (or at least a better way) that the time spent in optimization is more than gained back in increased query execution speed.

When you type a SQL statement to interactive SQL, the DBMS goes through all five steps while you wait for its response. The DBMS has little choice in the matter—it doesn’t know which statement you are going to type until you type it, so none of the processing can be done ahead of time. In programmatic SQL, however, the situation is quite different. Some of the early steps can be done at compile-time, when the programmer is developing the program. This leaves only the later steps to be done at runtime, when the program is executed by a user. When you use programmatic SQL, all DBMS products try to move as much processing as possible to compile-time, because once the final version of the program is developed, it may be executed thousands of times by users in a production application. In particular, the goal is to move optimization to compile-time if at all possible.

2. Embedded SQL Concepts

The central idea of embedded SQL is to blend SQL language statements directly into a program written in a host programming language, such as C, Pascal, COBOL, FORTRAN, PL/I, or Assembler. Embedded SQL uses the following techniques to embed the SQL statements:

  • SQL statements are intermixed with statements of the host language in the source program. This embedded SQL source program is submitted to a SQL precompiler, which processes the SQL statements.
  • Variables of the host programming language can be referenced in the embedded SQL statements, allowing values calculated by the program to be used by the SQL statements.
  • Program language variables are also used by the embedded SQL statements to receive the results of SQL queries, allowing the program to use and process the retrieved values.
  • Special program variables are used to assign NULL values to database columns and to support the retrieval of NULL values from the database.
  • Several new SQL statements that are unique to embedded SQL are added to the interactive SQL language, to provide for row-by-row processing of query results.

Figure 17-2 shows a simple embedded SQL program, written in C. The program illustrates many, but not all, of the embedded SQL techniques. The program prompts the user for an office number, retrieves the city, region, sales, and target for the office, and displays them on the screen.

Don’t worry if the program appears strange, or if you can’t understand all of the statements that it contains before reading the rest of this chapter. One of the disadvantages of the embedded SQL approach is that the source code for a program becomes an impure blend of two different languages, making the program hard to understand without training in both SQL and the programming language. Another disadvantage is that embedded SQL uses SQL language constructs not used in interactive SQL, such as the WHENEVER statement and the INTO clause of the SELECT statement—both used in this program.

3. Developing an Embedded SQL Program

An embedded SQL program contains a mix of SQL and programming language statements, so it can’t be submitted directly to a compiler for the programming language. Instead, it moves through a multistep development process, shown in Figure 17-3. The steps in the figure are actually those used by the IBM mainframe databases (DB2, SQL/DS), but all products that support embedded SQL use a similar process:

  1. The embedded SQL source program is submitted to the SQL precompiler, a programming tool. The precompiler scans the program, finds the embedded SQL statements, and processes them. A different precompiler is required for each programming language supported by the DBMS. Commercial SQL products typically offer precompilers for one or more languages, including C, Pascal, COBOL, FORTRAN, Ada, PL/I, RPG, and various assembly languages.
  2. The precompiler produces two files as its output. The first file is the source program, stripped of its embedded SQL statements. In their place, the precompiler substitutes calls to the private DBMS routines that provide the runtime link between the program and the DBMS. Typically, the names and calling sequences of these routines are known only to the precompiler and the DBMS; they are not a public interface to the DBMS. The second file is a copy of all the embedded SQL statements used in the program. This file is sometimes called a database request module, or DBRM.
  3. The source file output from the precompiler is submitted to the standard compiler for the host programming language (such as a C or COBOL compiler). The compiler processes the source code and produces object code as its output. Note that this step has nothing in particular to do with the DBMS or with SQL.
  4. The linker accepts the object modules generated by the compiler, links them with various library routines, and produces an executable program. The library routines linked into the executable program include the private DBMS routines described in Step 2.
  5. The database request module generated by the precompiler is submitted to a special BIND program. This program examines the SQL statements, parses, validates, and optimizes them, and produces an application plan for each statement. The result is a combined application plan for the entire program, representing a DBMS-executable version of its embedded SQL statements. The BIND program stores the plan in the database, usually assigning it the name of the application program that created it.

The program development steps in Figure 17-3 correlate with the DBMS statement processing steps in Figure 17-1. In particular, the precompiler usually handles statement parsing (the first step), and the BIND utility handles verification, optimization, and plan generation (the second, third, and fourth steps). Thus, the first four steps of Figure 17-1 all take place at compile-time when you use embedded SQL. Only the fifth step, the actual execution of the application plan, remains to be done at runtime.

The embedded SQL development process turns the original embedded SQL source program into two executable parts:

  • An executable program. Stored in a file on the computer in the same format as any executable program
  • An executable application plan. Stored within the database in the format expected by the DBMS

The embedded SQL development cycle may seem cumbersome, and it is more awkward than developing a standard C or COBOL program. In most cases, all of the steps in Figure 17-3 are automated by a single command procedure, so the individual steps are made invisible to the application programmer. The process does have several major advantages from a DBMS point of view, shown next.

  • The blending of SQL and programming language statements in the embedded SQL source program is an effective way to merge the two languages. The host programming language provides flow of control, variables, block structure, and input/output functions; SQL handles database access and does not have to provide these other constructs.
  • The use of a precompiler means that the compute-intensive work of parsing and optimization can take place during the development cycle. The resulting executable program is very efficient in its use of CPU resources.
  • The database request module produced by the precompiler provides portability of applications. An application program can be written and tested on one system, and then its executable program and DBRM can be moved to another system. After the BIND program on the new system creates the application plan and installs it in the database, the application program can use it without being recompiled itself.
  • The program’s actual runtime interface to the private DBMS routines is completely hidden from the application programmer. The programmer works with embedded SQL at the source-code level and does not have to worry about other, more complex interfaces.

4. Running an Embedded SQL Program

Recall from Figure 17-3 that the embedded SQL development process produces two executable components, the executable program itself and the program’s application plan, stored in the database. When you run an embedded SQL program, these two components are brought together to do the work of the application:

  1. When you ask the computer system to run the program, the computer loads the executable program in the usual way and begins to execute its instructions.
  2. One of the first calls generated by the precompiler is a call to a DBMS routine that finds and loads the application plan for the program.
  3. For each embedded SQL statement, the program calls one or more private DBMS routines, requesting execution of the corresponding statement in the application plan. The DBMS finds the statement, executes that part of the plan, and then returns control to the program.
  4. Execution continues in this way, with the executable program and the DBMS cooperating to carry out the task defined by the original embedded SQL source program.

4.1. Runtime Security

When you use interactive SQL, the DBMS enforces its security based on the user-id you supply to the interactive SQL program. You can type any SQL statement you want, but the privileges granted to your user-id determine whether the DBMS will or will not execute the statement you type. When you run a program that uses embedded SQL, there are two user-ids to consider:

  • The user-id of the person who developed the program, or more specifically, the person who ran the BIND program to create the application plan
  • The user-id of the person who is now executing the program and the corresponding application plan

It may seem strange to consider the user-id of the person who ran the BIND program (or more generally, the person who developed the application program or installed it on the computer system), but in fact, DB2 and several other commercial SQL products use both user-ids in their security scheme. To understand how the security scheme works, suppose that user JOE runs the ORDMAINT order maintenance program, which updates the ORDERS, SALES, and OFFICES tables. The application plan for the ORDMAINT program was originally bound by user-id OPADMIN, which belongs to the order-processing administrator.

In the DB2 scheme, each application plan is a database object, protected by DB2 security. To execute a plan, JOE must have the EXECUTE privilege for it. If he does not, execution fails immediately. As the ORDMAINT program executes, its embedded INSERT, UPDATE, and DELETE statements update the database. The privileges of the OPADMIN user determine whether the plan will be allowed to perform these updates. Note that the plan may update the tables even if JOE does not have the required privileges. However, the updates that can be performed are only those that have been explicitly coded into the embedded SQL statements of the program. Thus, DB2 provides very fine control over database security. The privileges of users to access tables can be very limited, without diminishing their ability to use canned programs.

Not all DBMS products provide security protection for application plans. For those that do not, the privileges of the user executing an embedded SQL program determine the privileges of the program’s application plan. Under this scheme, the user must have privileges to perform all of the actions performed by the plan, or the program will fail. If the user is not to have these same permissions in an interactive SQL environment, access to the interactive SQL program itself must be restricted, which is a disadvantage of this approach.

4.2. Automatic Rebinding

Note that an application plan is optimized for the database structure as it exists at the time the plan is placed in the database by the BIND program. If the structure changes later (for example, if an index is dropped or a column is deleted from a table), any application plan that references the changed structures may become invalid. To handle this situation, the DBMS usually stores, along with the application plan, a copy of the original SQL statements that produced it.

The DBMS also keeps track of all the database objects upon which each application plan depends. If any of these objects are modified by a DDL statement, the DBMS can find the plans that depend on it and automatically marks those plans as invalid. The next time the program tries to use the plan, the DBMS can detect the situation, and in most cases, it will automatically rebind the statements to produce a new bind image. Because the DBMS has maintained a great deal of information about the application plan, it can make this automatic rebinding completely transparent to the application program. However, a SQL statement may take much longer to execute when its plan is rebound than when the plan is simply executed.

Although the DBMS can automatically rebind a plan when one of the structures upon which it depends is changed, the DBMS will usually not automatically detect changes in the database that may make a better plan possible. For example, suppose a plan uses a sequential scan of a table to locate particular rows because no appropriate index existed when it was bound. It’s possible that a subsequent CREATE INDEX statement will create an appropriate index. To take advantage of the new structure, you must explicitly run the BIND program to rebind the plan.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *