Data Retrieval in Embedded SQL

Using the embedded SQL features described thus far, you can embed any interactive SQL statement except the SELECT statement in an application program. Retrieving data with an embedded SQL program requires some special extensions to the SELECT statement. The reason for these extensions is that there is a fundamental mismatch between the SQL language and programming languages such as C and COBOL: a SQL query produces an entire table of query results, but most programming languages can manipulate only individual data items or individual records (rows) of data.

Embedded SQL must build a bridge between the table-level logic of the SQL SELECT statement and the row-by-row processing of C, COBOL, and other host programming languages. For this reason, embedded SQL divides SQL queries into two groups:

Single-row queries. You expect the query results to contain a single row of data. Looking up a customer’s credit limit or retrieving the sales and quota for a particular salesperson are examples of this type of query.
Multirow queries. You expect that the query results may contain zero, one, or many rows of data. Listing the orders with amounts over $20,000 or retrieving the names of all salespeople who are over quota are examples of this type of query.

Interactive SQL does not distinguish between these two types of queries; the same interactive SELECT statement handles them both. In embedded SQL, however, the two types of queries are handled very differently. Single-row queries are simpler to handle and are discussed in the next section. Multirow queries are discussed shortly.

1. Single-Row Queries

Many useful SQL queries return a single row of query results. Single-row queries are especially common in transaction-processing programs, where a user enters a customer number or an order number and the program retrieves relevant data about the customer or order. In embedded SQL, single-row queries are handled by the singleton SELECT statement, shown in Figure 17-19. The singleton SELECT statement has a syntax much like that of the interactive SELECT statement. It has a SELECT clause, a FROM clause, and an optional WHERE clause. Because the singleton SELECT statement returns a single row of data, there is no need for a GROUP BY, HAVING, or ORDER BY clause. The INTO clause specifies the host variables that are to receive the data retrieved by the statement.

Figure 17-20 shows a simple program with a singleton SELECT statement. The program prompts the user for an employee number and then retrieves the name, quota, and sales of the corresponding salesperson. The DBMS places the three retrieved data items into the host variables repname, repquota, and repsales, respectively.

Recall that the host variables used in the INSERT, DELETE, and UPDATE statements in the previous examples were input host variables. In contrast, the host variables specified in the INTO clause of the singleton SELECT statement are output host variables. Each host variable named in the INTO clause receives a single column from the row of query results. The select list items and the corresponding host variables are paired in sequence, as they appear in their respective clauses, and the number of query results columns must be the same as the number of host variables. In addition, the data type of each host variable must be compatible with the data type of the corresponding column of query results.

Most DBMS brands will automatically handle reasonable conversions between DBMS data types and the data types supported by the programming language. For example, most DBMS products will convert MONEY data retrieved from the database into packed decimal (COMP-3) data before storing it in a COBOL variable, or into floating point data before storing it in a C variable. The precompiler uses its knowledge of the host variable’s data type to handle the conversion correctly.

Variable-length text data must also be converted before being stored in a host variable. Typically, a DBMS converts VARCHAR data into a null-terminated string for C programs and into a variable-length string (with a leading character count) for Pascal programs. For COBOL and FORTRAN programs, the host variable must generally be declared as a data structure with an integer count field and a character array. The DBMS returns the actual characters of data in the character array, and it returns the length of the data in the count field of the data structure.

If a DBMS supports date/time data or other data types, other conversions are necessary. Some DBMS products return their internal date/time representations into an integer host variable. Others convert the date/time data to text format and return it into a string host variable. Table 17-1 summarized the data type conversions typically provided by DBMS products, but you must consult the embedded SQL documentation for your particular DBMS brand for specific information.

1. The not found Condition

Like all embedded SQL statements, the singleton SELECT statement sets the values of the SQLCODE and SQLSTATE variables to indicate its completion status:

If a single row of query results is successfully retrieved, SQLCODE is set to zero and SQLSTATE is set to 00000; the host variables named in the INTO clause contain the retrieved values.
If the query produced an error, SQLCODE is set to a negative value and SQLSTATE is set to a nonzero error class (the first two characters of the five-digit SQLSTATE string); the host variables do not contain retrieved values.
If the query produced no rows of query results, a special NOT FOUND warning value is returned in SQLCODE, and SQLSTATE returns a NO DATA error class.
If the query produced more than one row of query results, it is treated as an error, and a negative SQLCODE is returned.

The SQL1 standard specifies the NOT FOUND warning condition, but it does not specify a particular value to be returned. DB2 uses the value +10 0, and most other SQL products follow this convention, including the other IBM SQL products, Ingres, and SQLBase. This value is also specified in the SQL2 standard, but as noted previously, SQL2 strongly encourages the use of the new SQLSTATE error variable instead of the older SQLCODE values.

2. Retrieving null Values

If the data to be retrieved from a database may contain NULL values, the singleton SELECT statement must provide a way for the DBMS to communicate the NULL values to the application program. To handle NULL values, embedded SQL uses indicator variables in the INTO clause, just as they are used in the VALUES clause of the INSERT statement and the SET clause of the UPDATE statement.

When you specify a host variable in the INTO clause, you can follow it immediately with the name of a companion host indicator variable. Figure 17-21 shows a revised version of the program in Figure 17-20 that uses the indicator variable repquota_ind with the host variable repquota. Because the NAME and SALES columns are declared NOT NULL in the definition of the SALESREPS table, they cannot produce NULL output values, and no indicator variable is needed for those columns.

After the SELECT statement has been executed, the value of the indicator variable tells the program how to interpret the returned data:

An indicator value of zero means the host variable has been assigned a retrieved value by the DBMS. The application program can use the value of the host variable in its processing.
A negative indicator value means the retrieved value was NULL. The value of the host variable is irrelevant and should not be used by the application program.
A positive indicator value indicates a warning condition of some kind, such as a rounding error or string truncation.

Because you cannot tell in advance when a NULL value will be retrieved, you should always specify an indicator variable in the INTO clause for any column of query results that may contain a NULL value. If the SELECT statement produces a column containing a NULL value and you have not specified an indicator variable for the column, the DBMS will treat the statement as an error and return a negative SQLCODE. Thus, indicator variables must be used to successfully retrieve rows containing NULL data.

Although the major use of indicator variables is for handling NULL values, the DBMS also uses indicator variables to signal warning conditions. For example, if an arithmetic overflow or division by zero makes one of the query results columns invalid, DB2 returns a warning SQLCODE of +8 02 and sets the indicator variable for the affected column to – 2. The application program can respond to the SQLCODE and examine the indicator variables to determine which column contains invalid data.

DB2 also uses indicator variables to signal string truncation. If the query results contain a column of character data that is too large for the corresponding host variable, DB2 copies the first part of the character string into the host variable and sets the corresponding indicator variable to the full length of the string. The application program can examine the indicator variable and may want to retry the SELECT statement with a different host variable that can hold a larger string.

These additional uses of indicator variables are fairly common in commercial SQL products, but the specific warning code values vary from one product to another. They are not specified by the ANSI/ISO SQL standard. Instead, the SQL2 standard specifies error classes and subclasses to indicate these and similar conditions, and the program must use the GET DIAGNOSTICS statement to determine more specific information about the host variable causing the error.

3. Retrieval Using Data Structures

Some programming languages support data structures, which are named collections of variables. For these languages, a SQL precompiler may allow you to treat the entire data structure as a single, composite host variable in the INTO clause. Instead of specifying a separate host variable as the destination for each column of query results, you can specify a data structure as the destination for the entire row. Figure 17-22 shows the program from Figure 17-21, rewritten to use a C data structure.

When the precompiler encounters a data structure reference in the INTO clause, it replaces the structure reference with a list of the individual variables in the structure, in the order they are declared within the structure. Thus, the number of items in the structure and their data types must correspond to the columns of query results. The use of data structures in the INTO clause is, in effect, a shortcut. It does not fundamentally change the way the INTO clause works.

Support for the use of data structures as host variables varies widely among DBMS brands. It is also restricted to certain programming languages. DB2 supports C and PL/I structures but does not support COBOL or assembly language structures, for example.

4. Input and Output Host Variables

Host variables provide two-way communication between the program and the DBMS. In the program shown in Figure 17-21, the host variables repnum and repname illustrate the two different roles played by host variables:

The repnum host variable is an input host variable, used to pass data from the program to the DBMS. The program assigns a value to the variable before executing the embedded statement, and that value becomes part of the SELECT statement to be executed by the DBMS. The DBMS does nothing to alter the value of the variable.
The repname host variable is an output host variable, used to pass data back from the DBMS to the program. The DBMS assigns a value to this variable as it executes the embedded SELECT statement. After the statement has been executed, the program can use the resulting value.

Input and output host variables are declared the same way and are specified using the same colon notation within an embedded SQL statement. However, it’s often useful to think in terms of input and output host variables when you’re actually coding an embedded SQL program. Input host variables can be used in any SQL statement where a constant can appear. Output host variables are used only with the singleton SELECT statement and with the FETCH statement, described in the next section of this chapter.

5. Multirow Queries

When a query produces an entire table of query results, embedded SQL must provide a way for the application program to process the query results one row at a time. Embedded SQL supports this capability by defining a new SQL concept, called a cursor, and adding several statements to the interactive SQL language. Here is an overview of embedded SQL techniques for multirow query processing and the new statements it requires:

The DECLARE CURSOR statement specifies the query to be performed and associates a cursor name with the query.
The OPEN statement asks the DBMS to start executing the query and generating query results. It positions the cursor before the first row of query results.
The FETCH statement advances the cursor to the first row of query results and retrieves its data into host variables for use by the application program. Subsequent FETCH statements move through the query results row by row, advancing the cursor to the next row of query results and retrieving its data into the host variables.
The CLOSE statement ends access to the query results and breaks the association between the cursor and the query results.

Figure 17-23 shows a program that uses embedded SQL to perform a simple multirow query. The numbered callouts in the figure correspond to the numbers in the preceding steps. The program retrieves and displays, in alphabetical order, the name, quota, and year-to-date sales of each salesperson whose sales exceed quota.

The interactive SQL query that prints this information is:

SELECT NAME, QUOTA, SALES

FROM SALESREPS

WHERE SALES > QUOTA

ORDER BY NAME

Notice that this query appears, word for word, in the embedded DECLARE CURSOR statement in Figure 17-23. The statement also associates the cursor name repcurs with the query. This cursor name is used later in the OPEN CURSOR statement to start the query and position the cursor before the first row of query results.

The FETCH statement inside the for loop fetches the next row of query results each time the loop is executed. The INTO clause of the FETCH statement works just like the INTO clause of the singleton SELECT statement. It specifies the host variables that are to receive the fetched data items—one host variable for each column of query results. As in previous examples, a host indicator variable (repquota_ind) is used when a fetched data item may contain NULL values.

When no more rows of query results are to be fetched, the DBMS returns the NOT FOUND warning in response to the FETCH statement. This is exactly the same warning code that is returned when the singleton SELECT statement does not retrieve a row of data. In this program, the WHENEVER NOT FOUND statement causes the precompiler to generate code that checks the SQLCODE value after the FETCH statement. This generated code branches to the label done when the NOT FOUND condition arises, and to the label error if an error occurs. At the end of the program, the CLOSE statement ends the query and terminates the program’s access to the query results.

5.1. Cursors

As the program in Figure 17-23 illustrates, an embedded SQL cursor behaves much like a filename or file handle in a programming language such as C or COBOL. Just as a program opens a file to access the file’s contents, it opens a cursor to gain access to the query results. Similarly, the program closes a file to end its access and closes a cursor to end access to the query results. Finally, just as a file handle keeps track of the program’s current position within an open file, a cursor keeps track of the program’s current position within the query results. These parallels between file input/output and SQL cursors make the cursor concept relatively easy for application programmers to understand.

Despite the parallels between files and cursors, there are also some differences. Opening a SQL cursor usually involves much more overhead than opening a file, because opening the cursor actually causes the DBMS to begin carrying out the associated query. In addition, SQL cursors support only sequential motion through the query results, like sequential file processing. In most current SQL implementations, there is no cursor analog to the random access provided to the individual records of a file.

Cursors provide a great deal of flexibility for processing queries in an embedded SQL program. By declaring and opening multiple cursors, the program can process several sets of query results in parallel. For example, the program might retrieve some rows of query results, display them on the screen for its user, and then respond to a user’s request for more detailed data by launching a second query. The following sections describe in detail the four embedded SQL statements that define and manipulate cursors.

6. The declare cursor Statement

The DECLARE CURSOR statement, shown in Figure 17-24, defines a query to be performed. The statement also associates a cursor name with the query. The cursor name must be a valid SQL identifier. It is used to identify the query and its results in other embedded SQL statements. The cursor name is specifically not a host language variable; it is declared by the DECLARE CURSOR statement, not in a host language declaration.

The SELECT statement in the DECLARE CURSOR statement defines the query associated with the cursor. The SELECT statement can be any valid interactive SQL SELECT statement, as described in Chapters 6 through 9. In particular, the SELECT statement must include a FROM clause and may optionally include WHERE, GROUP BY, HAVING, and ORDER BY clauses. The SELECT statement may also include the UNION operator, as described in Chapter 6. Thus, an embedded SQL query can use any of the query capabilities that are available in the interactive SQL language.

The query specified in the DECLARE CURSOR statement may also include input host variables. These host variables perform exactly the same function as in the embedded INSERT, DELETE, UPDATE, and singleton SELECT statements. An input host variable can appear within the query anywhere that a constant can appear. Note that output host variables cannot appear in the query. Unlike the singleton SELECT statement, the SELECT statement within the DECLARE CURSOR statement has no INTO clause and does not retrieve any data. The INTO clause appears as part of the FETCH statement, described shortly.

As its name implies, the DECLARE CURSOR statement is a declaration of the cursor. In most SQL implementations, including the IBM SQL products, this statement is a directive for the SQL precompiler; it is not an executable statement, and the precompiler does not produce any code for it. Like all declarations, the DECLARE CURSOR statement must physically appear in the program before any statements that reference the cursor that it declares. Most SQL implementations treat the cursor name as a global name that can be referenced inside any procedures, functions, or subroutines that appear after the DECLARE CURSOR statement.

It’s worth noting that not all SQL implementations treat the DECLARE CURSOR statement strictly as a declarative statement, and this can lead to subtle problems. Some SQL precompilers actually generate code for the DECLARE CURSOR statement (either host language declarations or calls to the DBMS, or both), giving it some of the qualities of an executable statement. For these precompilers, the DECLARE CURSOR statement must not only physically precede the OPEN, FETCH, and CLOSE statements that reference its cursor, but it must sometimes precede these statements in the flow of execution or be placed in the same block as the other statements.

In general, you can avoid problems with the DECLARE CURSOR statement by following these guidelines:

Place the DECLARE CURSOR statement right before the OPEN statement for the cursor. This placement ensures the correct physical statement sequence, it puts the DECLARE CURSOR and the OPEN statements in the same block, and it ensures that the flow of control passes through the DECLARE CURSOR statement, if necessary. It also helps to document just what query is being requested by the OPEN statement.
Make sure that the FETCH and CLOSE statements for the cursor follow the OPEN statement physically as well as in the flow of control.

7. The open Statement

The OPEN statement, shown in Figure 17-25, conceptually opens the table of query results for access by the application program. In practice, the OPEN statement actually causes the DBMS to process the query, or at least to begin processing it. The OPEN statement thus causes the DBMS to perform the same work as an interactive SELECT statement, stopping just short of the point where it produces the first row of query results.

The single parameter of the OPEN statement is the name of the cursor to be opened. This cursor must have been previously declared by a DECLARE CURSOR statement. If the query associated with the cursor contains an error, the OPEN statement will produce a negative SQLCODE value. Most query-processing errors, such as a reference to an unknown table, an ambiguous column name, or an attempt to retrieve data from a table without the proper permission, will be reported as a result of the OPEN statement. In practice, very few errors occur during the subsequent FETCH statements.

Once opened, a cursor remains in the open state until it is closed with the CLOSE statement. The DBMS also closes all open cursors automatically at the end of a transaction (that is, when the DBMS executes a COMMIT or ROLLBACK statement). After the cursor has been closed, it can be reopened by executing the OPEN statement a second time. Note that the DBMS restarts the query from scratch each time it executes the OPEN statement.

8. The fetch Statement

The FETCH statement, shown in Figure 17-26, retrieves the next row of query results for use by the application program. The cursor named in the FETCH statement specifies which row of query results is to be fetched. It must identify a cursor previously opened by the OPEN statement.

The FETCH statement fetches the row of data items into a list of host variables, which are specified in the INTO clause of the statement. An indicator variable can be associated with each host variable to handle retrieval of NULL data. The behavior of the indicator variable and the values that it can assume are identical to those described earlier in the “Single-Row Queries” section for the singleton SELECT statement. The number of host variables in the list must be the same as the number of columns in the query results, and the data types of the host variables must be compatible, column by column, with the columns of query results.

As shown in Figure 17-27, the FETCH statement moves the cursor through the query results, row by row, according to these rules:

The OPEN statement positions the cursor before the first row of query results. In this state, the cursor has no current row.

The FETCH statement advances the cursor to the next available row of query results, if there is one. This row becomes the current row of the cursor.
If a FETCH statement advances the cursor past the last row of query results, the FETCH statement returns a NOT FOUND warning. In this state, the cursor again has no current row.
The CLOSE statement ends access to the query results and places the cursor in a closed state.

If there are no rows of query results, the OPEN statement still positions the cursor before the (empty) query results and returns successfully. The program cannot detect that the OPEN statement has produced an empty set of query results. However, the very first FETCH statement produces the NOT FOUND warning and positions the cursor after the end of the (empty) query results.

9. The close Statement

The CLOSE statement, shown in Figure 17-28, conceptually closes the table of query results created by the OPEN statement, ending access by the application program. Its single parameter is the name of the cursor associated with the query results, which must be a cursor previously opened by an OPEN statement. The CLOSE statement can be executed any time after the cursor has been opened. In particular, it is not necessary to FETCH all rows of query results before closing the cursor, although this will usually be the case. All cursors are automatically closed at the end of a transaction. Once a cursor is closed, its query results are no longer available to the application program.

9.1. Scroll Cursors

The SQL1 standard specifies that a cursor can only move forward through the query results. Until the last few years, most commercial SQL products also supported only this form of forward, sequential cursor motion. If a program wants to reretrieve a row once the cursor has moved past it, the program must CLOSE the cursor and reOPEN it (causing the DBMS to perform the query again), and then FETCH through the rows until the desired row is reached.

In the early 1990s, a few commercial SQL products extended the cursor concept with the concept of a scroll cursor. Unlike standard cursors, a scroll cursor provides random access to the rows of query results. The program specifies which row it wants to retrieve through an extension of the FETCH statement, shown in Figure 19-28:

FETCH FIRST retrieves the first row of query results.
FETCH LAST retrieves the last row of query results.
FETCH PRIOR retrieves the row of query results that immediately precedes the current row of the cursor.
FETCH NEXT retrieves the row of query results that immediately follows the current row of the cursor. This is the default behavior if no motion is specified and corresponds to the standard cursor motion.
FETCH ABSOLUTE retrieves a specific row by its row number.
FETCH RELATIVE moves the cursor forward or backward a specific number of rows relative to its current position.

Scroll cursors can be especially useful in programs that allow a user to browse database contents. In response to the user’s request to move forward or backward through the data a row or a screenful at a time, the program can simply fetch the required rows of the query results. However, scroll cursors are also a great deal harder for the DBMS to implement than a normal, unidirectional cursor. To support a scroll cursor, the DBMS must keep track of the previous query results that it provided for a program, and the order in which it supplied those rows of results. The DBMS must also ensure that no other concurrently executing transaction modifies any data that has become visible to a program through a scroll cursor, because the program can use the extended FETCH statement to reretrieve the row, even after the cursor has moved past the row.

If you use a scroll cursor, you should be aware that certain FETCH statements on a scroll cursor may have a very high overhead for some DBMS brands. If the DBMS brand normally carries out a query step by step as your program FETCHes its way down through the query results, your program may wait a much longer time than normal if you request a FETCH NEXT operation when the cursor is positioned at the first row of query results. It’s best to understand the performance characteristics of your particular DBMS brand before writing programs that depend on scroll cursor functionality for production applications.

Because of the usefulness of scroll cursors, and because a few DBMS vendors had begun to ship scroll cursor implementations that were slightly different from one another, the SQL2 standard was expanded to include support for scroll cursors. The Entry SQL level of the standard requires only the older-style, sequential forward cursor, but conformance at the Intermediate SQL or Full SQL levels requires full support for the scroll cursor syntax shown in Figure 17-29. The standard also specifies that if any motion other than FETCH NEXT (the default) is used on a cursor, its DECLARE CURSOR statement must explicitly identify it as a scroll cursor. Using the SQL2 syntax, the cursor declaration in Figure 17-22 would appear as:

exec sql declare repcurs scroll cursor for

select name, quota, sales

from salesreps
where sales > quota
order by name;

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.