SQL and Objects: Sets, Arrays, and Collections

In a relational database, tables are the only database structure used to represent a set of objects. For example, the set of engineers in our personnel database is represented by the rows in the ENGINEERS table. Suppose each engineer has a set of academic degrees (a B.S. in science from MIT, a Ph.D. in electrical engineering from Michigan, and so on) that are to be stored in the database. The number of degrees for each engineer will vary—from none for some engineers to perhaps half a dozen for others. In a pure relational database, there is only one correct way to add this information to the data model. A new table, DEGREES, must be created, as shown in Figure 24-5. Each row in the DEGREES table represents one individual academic degree held by one of the engineers. A column in the DEGREES table holds the employee number of the engineer holding the degree described by that particular row, and serves as a foreign key to the ENGINEERS table, linking the two tables in a parent/child relationship. The other columns in the DEGREES table describe the particulars of the degree.

You have seen the type of parent/child relational table structure shown in Figure 24-5 many times in the earlier chapters of this book, and it has been a basic construct of relational databases since the beginning. However, there are some disadvantages to having this be the only way in which sets of data attributes can be modeled. First, the database tends to have a great many tables and foreign key relationships and becomes hard to understand. Second, many common queries need to join three, four, or more tables to get the required answers. Third, with the implementations of relational joins provided by most DBMS systems, the performance of queries will deteriorate as they involve more and more joins.

An object-oriented model of the engineers and their degrees would tend to reject the table structure of Figure 24-5. It would claim that the degrees are not substantial objects in their own right and deserving of their own table. Instead, they are attributes of the engineer holding the degrees. True, a variable number of degrees are associated with each engineer, but the object-oriented model would have no problem with representing this situation as an array or a set of data within the engineer object.

The object-relational databases support this object-oriented view of data by supporting sets, arrays, or other collection data types. A column within a table can be defined to have one of these data types. It will then contain not a single data item value, but a set of data item values. Special SQL extensions allow a user, or more often a stored procedure, to manipulate the set of data items as a whole or to access individual members of the set.

1. Defining Collections

Informix Universal Server supports collections of attributes through its collection data types. Three different collection data types are supported:

  • Lists. A list is an ordered collection of data items, all of which have the same type. Within a list, there is the concept of a first item, a last item, and the n-th item. The items in the list are not required to be unique. For example, a list of the first names of the employees hired in the last year, in order of hire, might be {‘Jim’, ‘Mary’, ‘Sam’, ‘Jim’, ‘John’}.
  • Multisets. A multiset is an unordered collection of data items, all of which have the same type. There is no concept of a sequencing to the items in a multiset; its items have no implied ordering. The items are not required to be unique. The list of employee first names could be considered a multiset if you didn’t care about the order of hire: {‘Jim’, ‘Sam’, ‘John’, ‘Jim’, ‘Mary’}.
  • Sets. A set is an unordered collection of unique data items, all of which have the same type. As in a multiset, there is no concept of first or last; the set has no implied ordering. The items must have unique values. The first names in the previous examples wouldn’t qualify, but the last names might: {‘Johnson’, ‘Samuels’, ‘Wright’, ‘Jones’, ‘Smith’}.

To illustrate the concept of collection data, we will expand the tables in our example object-relational database as follows:

  • The REPS table will include sales targets for each of the first, second, third, and fourth quarters. The quarterly targets can naturally be represented as a list column added to the REPS table. The quarters have a natural ordering (first through fourth), the quota for each quarter has the same data type (money), and the values are not necessarily unique (that is, the quotas for the first and second quarters might be the same).
  • The ENGINEERS table will include information about the academic degrees that each engineer holds. Two items of data will actually be stored about each degree—the actual degree (B.S., Ph.D., MBA, and so on) and the school. This data will be stored as a multiset column added to the ENGINEERS table, because it’s possible to have two identical entries—for example, an engineer may have a B.S. degree in engineering and a B.S. degree in business from the same school.
  • The TECHNICIANS table will include information about the projects to which each technician is assigned. Each technician may be assigned to two or more projects, but each project has a unique name. This data will be stored as a set column added to the TECHNICIANS table. The data values must be unique, but no particular order is associated with them.

Here are some Informix ALTER TABLE statements that implement these changes to the previously defined tables:

ALTER TABLE REPS
        ADD QTR_TGT LIST(MONEY(9,2)); /* four quarterly targets */
ALTER TABLE TECHNICIANS
        ADD PROJECT SET(VARCHAR(15)); /* projects assigned */
ALTER TABLE ENGINEERS (
        ADD DEGREES MULTISET(ROW( /* degree info */
     DEGREE VARCHAR(3),
     SCHOOL VARCHAR(15));

These collection column types create a row-within-a-row structure within the table that contains them, as shown in Figure 24-6. In the case of the ENGINEERS table, the structure might more accurately be described as a table within a table. Clearly, the relational model of row/column tables with atomic data items has been stretched considerably by the introduction of collection data types.

Informix Universal Server allows collections to be used quite generally and intermixed with other object-relational extensions. A collection can be a field of a row data type. The items of a collection can be row data types. It’s also possible to define collections within collections where that makes sense. For example, the projects in this example might have subprojects that must be tracked for each technician. At each level of additional complexity, the complexity of the stored procedure language (SPL) and SQL expressions that are required to manipulate the data items and process them increases accordingly.

Oracle also provides extensive support for collection-type data, through two different Oracle object-relational extensions:

  • Varying arrays. A varying array is an ordered collection of data items, all having the same data type. There is no requirement that the items in the array be unique. You define the maximum number of data items that can occur when you specify a varying array type for a column. Oracle provides extensions to SQL to access the individual items within the array.
  • Nested tables. A nested table is an actual table within a table. A column with a nested table type contains individual data items that are themselves tables. Oracle actually stores the nested table data separately from the main table that contains it, but it uses SQL extensions to process nested references to the inner table. Unlike a varying array, a nested table can contain any number of rows.

A column within a table can be declared to have a VARRAY (varying array) or TABLE OF (nested table) structure. Here are some Oracle CREATE TYPE and CREATE TABLE statements that use varying arrays and nested tables to achieve table structures like those shown in Figure 24-6:

CREATE TABLE REPS (
    EMPL_NUM INTEGER,
        NAME NAME_TYPE,
     ADDRESS ADDR_TYPE,
     SLS_MGR INTEGER, /* employee number of mgr */
      SALARY MONEY(9,2), /* annual salary */
       QUOTA MONEY(9,2), /* sales quota */
     QTR_TGT VARRAY(4) OF NUMBER(9,2)); /* four quarterly tgts */

      CREATE TYPE DEGR_TYPE AS OBJECT ( (
      DEGREE VARCHAR(3),
      SCHOOL VARCHAR(15));
      CREATE TABLE ENGINEERS (
    EMPL_NUM INTEGER,
   NAME NAME_TYPE,
     ADDRESS ADDR_TYPE,
      SALARY NUMBER(9,2), /* annual salary */
   YRS_EXPER INTEGER, /* years of experience */
     DEGREES TABLE OF DEGR_TYPE);
NESTED TABLE DEGREES STORE AS DEGREES_TABLE;

The quarterly target information for the REPS table is most easily represented as an Oracle varying array column. There will be exactly four quarters of information, so the maximum size of the array is known in advance. In this example, the varying array contains a simple data item as its element, but it’s also common to define varying arrays whose items are themselves abstract (structured) data types.

The academic degree information for the ENGINEERS table is represented as a nested table. For a data item like this one, you could decide to place an upper limit on the number of rows and use a varying array structure instead, but in general, if the maximum number of items is unknown, a nested table is the right choice. In this case, the nested table has an abstract data type composed of two attributes. Each row of the nested table will contain information about a degree granted and the school that granted it.

2. Querying Collection Data

Collection-valued columns complicate the process of querying the tables that contain them. In the SELECT item list, they generate multiple data values for each row of query results. In search conditions, they don’t contain individual data items, but it’s sometimes convenient to treat them as sets of data. The object-relational databases typically provide a limited set of SQL extensions or extend existing SQL concepts to provide simple queries involving collection data. For more advanced queries, they require you to write stored procedure language programs with loop structures that process the collection data items one by one.

For query purposes, Informix treats the collection types as if they were a set of data values, like the values that might be returned by a subquery. You can match individual items within a collection using the SQL IN search condition. Here is a query that finds any technicians who work on a project named “bingo”:

 SELECT EMPL_NUM, NAME

   FROM TECHNICIANS

  WHERE ‘bingo’ IN (PROJECTS);

The name of the collection-valued column (in this case, the set-valued column PROJECTS) appears in parentheses. Informix treats the members of the collection as a set and applies the IN matching condition. In interactive SQL, you can put a collection-valued column in the select item list. Informix displays the collection of data as either a SET, LIST, or MULTISET in the displayed output. To process collection­valued data in the select list of a programmatic request (that is, from a program using ESQL or a call-level API), you must use special API extensions and/or extensions to the Informix stored procedure language.

Oracle provides additional capabilities for processing nested tables within SQL queries. A special THE keyword flattens the nested table, in effect, producing an unnested table with one row for each row of the nested table within each row of the main table. Here’s a query that shows the schools from which one of the engineers has received degrees:

SELECT NEST.SCHOOL

  FROM THE (SELECT DEGREES

FROM ENGINEERS

WHERE EMPL_NUM = 1234) NEST;

The query within the inner parentheses is a query against the main (ENGINEERS) table. It selects the column containing the nested table, but it could select other columns as well. The THE operation, applied to the query results, flattens them out, creating a row for each nested row within each row of the main table. This flattened table is assigned an alias (NEST in this example), and it becomes the source of candidate query results rows from the FROM clause of the main, top-level query. With this table as a source, the main query in this example is quite simple; it selects one column that originated in the nested table.

The ability to flatten nested tables in this way and process them as if they were actually joined versions of two separate tables is actually quite powerful. It allows many queries to be expressed in high-level SQL that would otherwise require you to resort to stored procedures. However, the logic behind such queries and the task of actually constructing them correctly can be extremely complicated, as even this simple example begins to show.

3. Manipulating Collection Data

Extensions to standard SQL syntax are used to insert new rows into a table containing collection-valued columns. Informix provides a trio of constructors—the SET constructor, MULTISET constructor, and LIST constructor—for this purpose. They transform a list of data items into the corresponding collections to be inserted. Here is a pair of INSERT statements that illustrates their use with the tables in Figure 24-6:

INSERT INTO TECHNICIANS

VALUES (1279,

ROW(‘Sam’, ‘R’, ‘Jones’),

ROW(‘164 Elm St.’, ‘Highland’, ‘IL’,

ROW(12345, 6789)), SET{‘atlas’, ‘checkmate’, ‘bingo’}”);

INSERT INTO ENGINEERS

VALUES (1281,

ROW(‘Jeff’, ‘R’, ‘Ames’),

ROW(‘1648 Green St.’, ‘Elgin’, ‘IL’,

ROW(12345, 6789)),

MULTISET{ROW(‘BS’, ‘Michigan’),

ROW(‘BS’, ‘Michigan’),

ROW(‘PhD’,’Stanford’)}”);

The first statement inserts a single row into the TECHNICIANS table with a three-item set in the PROJECTS column. The second inserts a single row into the ENGINEERS table with a three-item multiset in the DEGREES column. Because the members of this particular multiset are themselves row types, the row constructor must be used for each item.

Oracle uses a different approach to constructing the collection-valued data items for insertion into the table. Recall from the discussion of Oracle abstract data types that each Oracle abstract data type automatically has an associated constructor method that is used to build a data item of the abstract type out of individual data items. This concept is extended to varying arrays and nested tables. A constructor method is automatically supplied for each varying array or nested table, and it is used in the INSERT statements:

INSERT INTO TECHNICIANS

VALUES (NAME_TYPE(‘Sam’, ‘R’, ‘Jones’),

ADDR_TYPE(‘164 Elm St.’, ‘Highland’, ‘IL’,

POST_TYPE(12345, 6789)),

PROJECTS(‘atlas’, ‘checkmate’, ‘bingo’));

INSERT INTO ENGINEERS
     VALUES (NAME_TYPE(‘Jeff’, ‘R’, ‘Ames’),
            ADDR_TYPE(‘1648 Green St.’, ‘Elgin’,’IL’,

POST_TYPE(12345, 6789)),

DEGREES(DEGREE_TYPE(‘BS’, ‘Michigan’),

   DEGREE_TYPE(‘BS’, ‘Michigan’),

   DEGREE_TYPE(‘PhD’, ‘Stanford’)));

4. Collections and Stored Procedures

Collections pose special problems for stored procedures that are retrieving and manipulating data in tables that contain them. Both Oracle and Informix provide special stored procedure language facilities for this purpose. In Informix, special SPL collection variables must be used. Here is an SPL stored procedure fragment that handles the PROJECTS collection column from the TECHNICIANS table:

define proj_coll collection; /* holds project collection */
define a_project varchar(15); /* holds individual project */
define proj_cnt integer; /* number of projects */
define empl_name name_type; /* buffer for tech name */

/* Check how many projects the technician is supporting */
select cardinality(projects) into proj_cnt
  from technicians
 where empl_num = 1234;

/* If too many projects, then refuse to add a new one */
if (proj_cnt > 6) then . . .

/* Retrieve row, including project set for the technician */
select name, projects into empl_name, proj_coll
from technicians
where empl_num = 1234;

/* Add the ‘gonzo’ project to the list for this tech */
insert into table(proj_coll)
values (‘gonzo’);

/* Search through project list one by one */
foreach proj_cursor for

select * into a_project

from table(proj_coll)

if (a_project = ‘atlas’) then

begin

update table(proj_coll)(project)

set project = ‘bingo’

where current of proj_cursor;

exit foreach;

end;

end if;

end foreach;

/* Update the database row with modified project list */

update technicians

set projects = proj_coll

where empl_num = 1234;

The example shows several aspects of collection-handling in Informix SPL. First, the collection is retrieved from the database into an SPL variable as a collection data type. It would also be possible to retrieve it into a variable explicitly declared as having a SET type (or in other situations, a LIST or MULTSET type). The collection stored in the variable is then explicitly treated as a table for manipulating items within the collection. To add a new project, an INSERT is performed into the collection table. To find and modify a specific project, a cursor is used to search through the collection table, and a cursor-based UPDATE statement is used to change the value of one member of the collection. Note that the FOREACH loop retrieves each item of the collection into a variable so that the SPL routine can process it. Finally, the collection variable’s contents are used to update the collection column within the table.

Oracle takes a similar approach to processing varying arrays. The individual elements of an array within an abstract data type are available through subscripted references within a structured data type. The typical Oracle PL/SQL process for accessing variable array elements is:

  1. Retrieve the row from the table containing the varying array into a local variable whose data type is defined to match the row structure of the table, or of the particular columns being retrieved.
  2. Execute a FOR loop with an index variable, n, that counts from 1 to the number of elements in the varying array. The number of elements is available through the value of a special attribute of the array column named COUNT.
  3. Within the FOR loop, a subscript is used on the varying array name to access the n-th element of the varying array.

A similar technique can be used to process nested tables; however, it’s usually not necessary. Instead, the THE operator is generally used to flatten the table in a SQL query, and the results are processed with a single cursor-driven FOR loop. The processing may still be complex. In particular, the stored procedure may need to detect whether a particular row coming from the query results is from the same main table row as the previous row and, upon detecting a change in main table rows, perform special processing such as computing subtotals. In this aspect, the processing of both varying arrays and nested tables begins to resemble the nested-loop processing typical of the COBOL report-writing programs of 30 years ago that handled master and detail records.

As the discussion in this section has illustrated, collection types and the processing of individual collection items tend to call for programmatic access through stored procedures rather than for ad hoc SQL use. One of the criticisms of object-oriented databases is that they are a regression from the simplicity of the relational model and reintroduce that need for explicit database navigation that was part of the prerelational databases. Examples like these provide evidence that there is at least a certain amount of truth in the criticism.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *