SQL and Objects: Object-Relational Databases

Object-relational databases typically begin with a relational database foundation, and add selected features that provide object-oriented capabilities. This approach simplifies the addition of object capabilities for the major RDBMS vendors, whose enterprise-class RDBMS products have been developed over the course of 15 or more years and would be tremendously costly to reproduce from scratch. It also recognizes the large installed base of relational systems and gives those customers a smoother upgrade path (not to mention an upgrade revenue stream for the vendors).

The object extensions that are commonly found in object-relational databases are:

  • Large data objects. Traditional relational data types are small in size—integers, dates, short character strings; large data objects can store documents, audio and video clips, web pages, and other new media data types.
  • Structured/abstract data types. Relational data types are atomic and indivisible; structured data types allow groups of individual data items to be grouped into higher-level structures that can be treated as entities of their own.
  • User-defined data types. Relational databases typically provide a limited range of built-in data types; object-oriented systems and databases emphasize the user’s ability to define his or her own new data types.
  • Tables within tables. Relational database columns store individual data items; object-relational databases allow columns to contain complex data items, such as structured types or even entire tables. This can be used to represent object hierarchies.
  • Sequences, sets, and arrays. In a traditional relational database, sets of data are represented by rows in their own table, linked to an owning entity by a foreign key; object-relational databases may allow the direct storage of collections of data items (sequences, sets, arrays) within a single column.
  • Stored procedures. Traditional relational databases provide set-based interfaces, such as SQL, for storing, selecting, and retrieving data; object-relational databases provide procedural interfaces, such as stored procedures, that encapsulate the data and provide strictly defined interactions.
  • Handles and object-ids. A pure relational database requires that data within each row of the database itself (the primary key) uniquely identifies the row; object-relational databases provide built-in support for row-ids or other unique identifiers for objects.

1. Large Object Support

Relational databases have traditionally focused on business data processing. They store and manipulate data items that represent money amounts, names, addresses, unit quantities, dates, times, and the like. These data types are relatively simple and require small amounts of storage space, from a few bytes for an integer that holds order or inventory quantities to a few dozen bytes for a customer name, employee address, or product description. Relational databases have been optimized to manage rows containing up to a few dozen columns of this type of data. The techniques they use to manage disk storage and to index data assume that data rows will occupy a few hundred to a few thousand bytes. The programs that store and retrieve data can easily hold dozens or hundreds of these types of data items in memory, and can easily store and retrieve entire rows of data at a time through reasonably sized memory buffers. The row-at-a-time processing techniques for relational query results work well.

Many modern types of data have quite different characteristics from traditional business data. A single high-resolution graphical image to be displayed on a PC screen can require hundreds of thousands of bytes of storage or more. A word processing document, such as a contract or the text of this book, can take even more storage. The HTML text that defines web pages and the PostScript files that define printed images are other examples of larger, document-oriented data items. Even a relatively short high-quality audio track can occupy millions of bytes, and video clips can run to hundreds of megabytes or even gigabytes of data. As multimedia applications have become more important, users have wanted to manage these types of data along with the other data in their databases. The capability to efficiently manage large objects, often called binary large objects (BLOBs), was one of the earliest advantages claimed for object-oriented databases.

2. Blobs in the Relational Model

The first approach to supporting BLOBs in relational databases was through the underlying operating system and its file system. Each individual BLOB data item was stored in its own operating system file. The name of the file was placed in a character-valued column within a table, as a pointer to the file. The table’s other columns could be searched to find rows that met certain criteria. When an application needed to manipulate the BLOB content associated with one of the rows, it read the name of the file and retrieved the BLOB data from it. Management of the file input/output was the responsibility of the application program. This approach worked, but it was error-prone and required that a programmer understand both the RDBMS and the file system interfaces. The lack of integration between the BLOB contents and the database was readily apparent. For example, you couldn’t ask the database to compare two BLOB data items to see if they were the same, and the database couldn’t provide even basic text searching capability for BLOB contents.

Today, most major enterprise-class DBMS systems provide direct support for one or more types of BLOB data. You can define a column as containing one of these BLOB data types and use it in certain situations in SQL statements. There are typically substantial restrictions on the BLOB data, such as not allowing its use in a join condition or a GROUP BY clause.

Sybase provides two large object data types. Its TEXT data type can store up to 2 billion bytes of variable-length text data. You can use a limited set of SQL capabilities (such as the LIKE text-search operator) to search the contents of TEXT columns. A companion IMAGE data type can store up to 2 billion bytes of variable-length binary data. Microsoft SQL Server supports these types, plus an NTEXT data type that allows up to 1 billion characters of 2-byte national language text.

IBM’s DB2 provides a similar set of data types. A DB2 character large object (CLOB) type stores up to 2 billion bytes of text. A DB2 double-byte character large object (DBCLOB) type stores up to 1 billion 2-byte characters. A DB2 binary large object (BLOB) stores up to 2 billion bytes of binary data.

Oracle historically provided two large object data types. A LONG data type stored up to 2 billion bytes of text data. A LONG RAW data type stored up to 2 billion bytes of binary data. Oracle restricted the use of either LONG type to only a single column per table. With the introduction of Oracle8, support for BLOB data was expanded substantially:

  • An Oracle BLOB type stores up to 4 gigabytes of binary data within the database.
  • An Oracle CLOB type stores up to 4 gigabytes of single-byte character data within the database.
  • An Oracle NCLOB type stores multibyte character data as a BLOB.
  • An Oracle BFILE type stores long binary data in a file external to the database.

The BLOB, CLOB, and NCLOB types are tightly integrated into Oracle’s operation, including transaction support. BFILE data is managed through a pointer within the database to an external operating system file. It is not supported by Oracle transaction semantics. Special Oracle PL/SQL functions are provided to manipulate BLOB, CLOB, and NCLOB data from within PL/SQL stored procedures, as described in the next section.

Informix Universal Server’s support for large object data is similar to that of Oracle. It supports simple large objects and smart large objects:

  • An Informix BYTE type is a simple large object that stores binary data.
  • An Informix TEXT type is a simple large object that stores text data.
  • An Informix BLOB type is a smart large object that stores binary data.
  • An Informix CLOB type is a smart large object that stores text (character) data.

Informix simple large objects store up to 1 gigabyte of data. The entire large object must be retrieved or stored as a unit from the application program, or it can be copied between the database and an operating system file. Smart large objects can store up to 4 terabytes of data. Special Informix functions are provided to process smart large objects in smaller, more manageable chunks. These functions provide random access to the contents of an Informix smart object, similar to the random access typically provided for operating system files. Informix also provides advanced controls over logging, transaction management, and data integrity for smart large objects.

3. Specialized blob Processing

Because BLOBs can be very large in size compared to the data items typically handled by RDBMS systems, they pose special problems in several areas:

  • Data storage and optimization. Storing a BLOB item in-line with the other contents of a table’s row would destroy the optimization that the DBMS performs to fit database data neatly into pages that match the size of disk pages. For this reason, BLOB data is always stored out-of-line in separate storage areas. Most DBMS brands that support BLOBs provide special BLOB storage options, including named storage spaces that are specified when the BLOB type column is created.
  • Storing BLOB data in the database. Because a BLOB can be tens or hundreds of megabytes in size, most programs can’t hold the entire contents of a BLOB in a memory buffer at once. They process portions of the BLOB at a time (for example, pages of a long document or individual frames of a video clip). But Embedded SQL and normal SQL APIs are designed for row-at-a-time processing (through INSERT and UPDATE statements) that store the values for all columns in the row at once. Special techniques are required to put data into a database BLOB column piece by piece, through multiple API calls per BLOB column.
  • Retrieving BLOB data from the database. This is the same issue as retrieving the data, but in reverse. Embedded SQL and normal SQL APIs are designed for SELECT statement or FETCH statement processing that retrieves data values for all columns of a row at once. But because a stored BLOB value can be tens or hundreds of megabytes in size, most programs can’t possibly process it all at once in a memory buffer. Special techniques are required to retrieve the database BLOB column data, piece by piece, so that it can be processed by the application.
  • Transaction logging. Most DBMS’s support transactions by maintaining before and after images of modified data in a transaction log. Because of the potentially large size of BLOB data, the logging overhead could be extreme. For this reason, many DBMS’s don’t support logging for BLOB data, or they allow logging but provide the ability to turn it on and off.

Several DBMS’s address these issues through extended APIs that specifically support BLOB manipulation. These calls provide random access to individual segments of the BLOB contents, allowing the program to retrieve or store the BLOB in manageable chunks. Oracle8 introduced this capability for manipulating its LOB data types (character and binary) within stored procedures written in the Oracle PL/SQL language. Its capabilities are similar to those provided by other object-relational databases, such as Informix Universal Server.

When a stored procedure reads an Oracle LOB column from a table, Oracle does not actually return the contents of the column. Instead, a locator for the LOB data (in object parlance, a handle for the LOB) is returned. The locator is used in conjunction with a set of nine special LOB-processing functions that the stored procedure can then use to manipulate the actual data stored in the LOB column of the database. Here is a brief description of each LOB-processing function:

  • dbms_lob.read(locator, length, offset, buffer). Reads into the PL/SQL buffer the indicated number of bytes/characters from the LOB identified by the locator, starting at the offset.
  • dbms_lob.write(locator, length, offset, buffer). Writes the indicated number of bytes/characters from the PL/SQL buffer into the LOB identified by the locator, starting at the offset.
  • dbms_lob.append(locator1, locator2). Appends the entire contents of the LOB identified by locatorl to the end of the contents of the LOB identified by locatorl.
  • dbms_lob.erase(locator, length, offset). Erases the contents of the LOB identified by the locator at offset for length bytes/characters; for character-based LOBs, spaces are inserted, and for binary LOBs, binary zeroes are inserted.
  • dbms_lob.copy(locator1, locator2, length, offset1, offset2). Copies length bytes/characters from the LOB identified by locatorl at offsetl into the LOB identified by locatorl at offsetl.
  • dbms_lob.trim(locator1, length). Trims the LOB identified by the locator to the indicated number of bytes/characters.
  • dbms_lob.substr(locator, length, offset). Returns (as a text string return value) the indicated number of bytes/characters from the LOB identified by the locator, starting at the offset; the return value from this function may be assigned into a PL/SQL VARCHAR variable.
  • dbms_lob.getlength(locator). Returns (as an integer value) the length in bytes/characters of the LOB identified by the locator.
  • dbms_lob.compare(locator1, locator2, length, offset1, offset2). Compares the LOB identified by locator1 to the LOB identified by locator2, starting at offset1 and offset2, respectively, for length bytes/characters; returns zero if they are the same and nonzero if they are not.
  • dbms_lob.instr(locator, pattern, offset, i). Returns (as an integer value) the position within the LOB identified by the locator where the z’-th occurrence of pattern is matched; the returned value may be used as an offset in subsequent LOB processing calls.

Oracle imposes one further restriction on updates and modifications to LOB values that are performed through these functions. LOBs can impose an unacceptably high overhead on Oracle’s transaction mechanisms, so Oracle normally does not lock the contents of a LOB data item when the row containing the LOB is read by an application program or a PL/SQL routine. If the LOB data is to be updated, the row must be explicitly locked prior to modifying it. This is done by including a FOR UPDATE clause in the SELECT statement that retrieves the LOB locator. Here is a PL/SQL fragment that retrieves a row containing a LOB that contains document text, and updates 100 characters in the middle of the LOB data:

declare

lob    CLOB;

textbuf varchar(255);

begin

/* Put text to be inserted into buffer /

/* Get lob locator and lock LOB for update */

select document_lob into lob

  from documents

 where document_id = ‘34218’

for update;

/* Write new text 500 bytes into LOB */

dbms_lob.write(lob/100/500/textbuf);

commit;

end;

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *