SQL Basics: Data Types

The ANSI/ISO SQL standard specifies the various types of data that can be stored in a SQL-based database and manipulated by the SQL language. The original SQL1 standard specified only a minimal set of data types. The SQL2 standard expanded this list to include variable-length character strings, date and time data, bit strings, and other types. Today’s commercial DBMS products can process a rich variety of different kinds of data, and there is considerable diversity in the particular data types supported across different DBMS brands. Typical data types include the following:

Integers. Columns holding this type of data typically store counts, quantities, ages, and so on. Integer columns are also frequently used to contain ID numbers, such as customer, employee, and order numbers.
Decimal numbers. Columns with this data type store numbers that have fractional parts and must be calculated exactly, such as rates and percentages. They are also frequently used to store money amounts.
Floating point numbers. Columns with this data type are used to store scientific numbers that can be calculated approximately, such as weights and distances. Floating point numbers can represent a larger range of values than decimal numbers but can produce round-off errors in computations.
Fixed-length character strings. Columns holding this type of data typically store names of people and companies, addresses, descriptions, and so on.
Variable-length character strings. This data type allows a column to store character strings that vary in length from row to row, up to some maximum length. (The SQL1 standard permitted only fixed-length character strings, which are easier for the DBMS to process but can waste considerable space.)
Money amounts. Many SQL products support a MONEY or CURRENCY type, which is usually stored as a decimal or floating point number. Having a distinct money type allows the DBMS to properly format money amounts when they are displayed.
Dates and times. Support for date/time values is also common in SQL products, although the details can vary considerably from one product to another. Various combinations of dates, times, timestamps, time intervals, and date/time arithmetic are generally supported. The SQL2 standard includes an elaborate specification for DATE, TIME, TIMESTAMP, and INTERVAL data types, including support for time zones and time precision (for example, tenths or hundredths of seconds).
Boolean data. Some SQL products, such as Informix Dynamic Server, support logical (TRUE or FALSE) values as an explicit type, and some permit logical operations (comparison, AND /OR, and so on) on the stored data within SQL statements.
Long text. Several SQL-based databases support columns that store long text strings (typically up to 32,000 or 65,000 characters, and in some cases even larger). This allows the database to store entire documents, product descriptions, technical papers, resumes, and similar unstructured text data. The DBMS usually restricts the use of these columns in interactive queries and searches.
Unstructured byte streams. Several DBMS products allow unstructured, variable- length sequences of bytes to be stored and retrieved. Columns containing this data are used to store compressed video images, executable code, and other types of unstructured data. SQL Server’s IMAGE data type, for example, can store a stream of up to 2 billion bytes of data.
Non-Roman characters. As databases grow to support global applications, DBMS vendors have added support for fixed-length and variable-length strings of 16-bit characters used to represent Kanji and other Asian and Arabic characters. While most modern databases support storing and retrieving such characters (often using the UNICODE convention for representing them), support for searching and sorting on these GRAPHIC and VARGRAPHIC types varies widely.

Table 5-4 lists the data types specified in the ANSI/ISO SQL standard.

The differences between the data types offered in various SQL implementations is one of the practical barriers to the portability of SQL-based applications. These differences have come about as a result of innovation as relational databases have evolved to include a broader range of capabilities. This has been the typical pattern:

A DBMS vendor adds a new data type that provides useful new capabilities for a certain group of users.
Other DBMS vendors add the same or similar data types, adding their own innovations to differentiate their products from the others.
Over several years, the popularity of the data type grows, and it becomes a part of the “mainstream” set of data types supported by most SQL implementations.
The standards bodies become involved to try to standardize the new data type and eliminate arbitrary differences between the vendor implementations. The more well-entrenched the data type has become, the more difficult the set of compromises faced by the standards group. Usually, this results in an addition to the standard that does not exactly match any of the current implementations.
DBMS vendors slowly add support for the new standardized data type as an option to their systems, but because they have a large installed base that is using the older (now “proprietary”) version of the data type, they must maintain support for this form of the data type as well.
Over a very long period of time (typically several major releases of the DBMS product), users migrate to the new, standardized form of the data type, and the DBMS vendor can begin the process of obsoleting the proprietary version.

Date/time data provides an excellent example of this phenomenon and the data type variations it creates. DB2 offered early date/time support, with three different date/time data types:

DATE. Stores a date like June 30, 1991
TIME. Stores a time of day like 12:30 M.
TIMESTAMP. A specific instant in history, with a precision down to the nanosecond

Specific dates and times can be specified as string constants, and date arithmetic is supported. Here is an example of a valid query using DB2 dates, assuming that the HIREDATE column contains DATE data:

SELECT NAME, HIRE_DATE

FROM SALESREPS

WHERE HIRE_DATE >= ’05/30/1989′ + 15 DAYS

SQL Server was introduced with a single date/time data type, called DATETIME, which closely resembles the DB2 TIMESTAMP data type. If HIRE_DATE contained DATETIME data, SQL Server could accept this version of the query (without the date arithmetic):

SELECT NAME, HIRE_DATE

FROM SALESREPS

WHERE HIRE_DATE >= ’06/14/1989′

Since no specific time on June 14,1989, is specified in the query, SQL Server defaults to midnight on that date. The SQL Server query thus really means:

SELECT NAME, HIRE_DATE

FROM SALESREPS

WHERE HIRE_DATE >= ’06/14/1989 12:00AM’

If a salesperson’s hire date were stored in the database as noon on June 14, 1989, the salesperson would not be included in the SQL Server query results but would have been included in the DB2 results (because only the date would be stored). SQL Server also supports date arithmetic through a set of built-in functions. Thus, the DB2-style query can also be specified in this way:

SELECT NAME, HIRE_DATE

FROM SALESREPS

WHERE HIRE_DATE >= DATEADD(DAY, 15, ’05/30/1989′)

which is considerably different from the DB2 syntax.

Oracle also supports date/time data, with a single data type called DATE. Like SQL Server’s DATETIME type, an Oracle DATE is, in fact, a timestamp. Also like SQL Server, the time part of an Oracle DATE value defaults to midnight if no time is explicitly specified. The default Oracle date format is different from the DB2 and SQL Server formats, so the Oracle version of the query becomes:

SELECT NAME, HIRE_DATE

FROM SALESREPS

WHERE HIRE_DATE >= ’14-JUN-89′

Oracle also supports limited date arithmetic, so the DB2-style query can also be specified but without the DAYS keyword:

SELECT NAME, HIRE_DATE

FROM SALESREPS

WHERE HIRE_DATE >= ’30-MAY-89′ + 15

Finally, the ANSI/ISO SQL2 standard added support for date/time data with a set of data types based on, but not identical to, the DB2 types. In addition to the DATE, TIME, and TIMESTAMP data types, the standard specifies an INTERVAL data type, which can be used to store a time interval (for example, a timespan measured in days, or a duration measured in hours, minutes, and seconds). The standard also provides a very elaborate and complex method for dealing with date/time arithmetic, specifying the precision of intervals, adjusting for time zone differences, and so on.

As these examples illustrate, the subtle differences in data types among various SQL products lead to some significant differences in SQL statement syntax. They can even cause the same SQL query to produce slightly different results on different database management systems. The widely praised portability of SQL is thus true but only at a general level. An application can be moved from one SQL database to another, and it can be highly portable if it uses only the most mainstream, basic SQL capabilities. However, the subtle variations in SQL implementations mean that data types and SQL statements must almost always be adjusted somewhat if they are to be moved across DBMS brands. The more complex the application, the more likely it is to become dependent on DBMS- specific features and nuances, and the less portable it will become.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply Cancel reply

Login