SQL Server Components:Data Types

All the data values of a column must be of the same data type. (The only exception specifies the values of the SQL_VARIANT data type.) Transact-SQL uses different data types, which can be categorized as follows:

Numeric data types
Character data types
Temporal (date and/or time) data types
Miscellaneous data types

The following sections describe all these categories.

1. Numeric Data Types

Numeric data types are used to represent numbers. The following table shows the list of all numeric data types:

2. Character Data Types

There are two general forms of character data types. They can be strings of single-byte characters or strings of Unicode characters. (Unicode uses 2 bytes to specify one character.) Further, strings can have fixed or variable length. The following character data types are used:

NOTE The VARCHAR data type is identical to the CHAR data type except for one difference: if the content of a CHAR(n) string is shorter than n characters, the rest of the string is padded with blanks. (A value of the VARCHAR data type is always stored in its actual length.)

3. Temporal Data Types

Transact-SQL supports the following temporal data types:

DATETIME
SMALLDATETIME
DATE
TIME
DATETIME2
DATETIMEOFFSET

The DATETIME and SMALLDATETIME data types specify a date and time, with each value being stored as an integer value in 4 bytes or 2 bytes, respectively. Values of DATETIME and SMALLDATETIME are stored internally as two separate numeric values. The date value of DATETIME is stored in the range 01/01/1753 to 12/31/9999. The analog value of SMALLDATETIME is stored in the range 01/01/1900 to 06/06/2079. The time component is stored in the second 4-byte (or 2-byte for SMALLDATETIME) field as the number of three-hundredths of a second (DATETIME) or minutes (SMALLDATETIME) that have passed since midnight.

The use of DATETIME and SMALLDATETIME is rather inconvenient if you want to store only the date part or time part. For this reason, the Database Engine introduced the data types DATE and TIME, which store just the DATE or TIME component of a DATETIME, respectively. The DATE data type is stored in 3 bytes and has the range 01/01/0001 to 12/31/9999. The TIME data type is stored in 3-5 bytes and has an accuracy of 100 nanoseconds (ns).

The DATETIME2 data type stores high-precision date and time data. The data type can be defined for variable lengths depending on the requirement. (The storage size is 6-8 bytes). The accuracy of the time part is 100 ns. This data type isn’t aware of Daylight Saving Time.

All the temporal data types described thus far don’t have support for the time zone. The data type called DATETIMEOFFSET has the time zone offset portion. For this reason, it is stored in 6-8 bytes. (All other properties of this data type are analogous to the corresponding properties of DATETIME2.)

The date value in Transact-SQL is by default specified as a string in a format like ‘mmm dd yyyy’ (e.g., ‘Jan 10 1993’) inside two single quotation marks or double quotation marks. (Note that the relative order of month, day, and year can be controlled by the SET DATEFORMAT statement. Additionally, the system recognizes numeric month values with delimiters of / or -.) Similarly, the time value is specified in the format ‘hh:mm’ and the Database Engine uses 24-hour time (’23:24′, for instance).

NOTE Transact-SQL supports a variety of input formats for datetime values. As you already know, both objects are identified separately; thus, date and time values can be specified in any order or alone. If one of the values is omitted, the system uses the default values. (The default value for time is 12:00 AM.)

Examples 4.4 and 4.5 show the different ways date and time values can be written using the different formats.

Example 4.4

The following date descriptions can be used:

’28/5/1959′ (with SET DATEFORMAT dmy)

‘May 28, 1959’

‘1959 MAY 28’

Example 4.5

The following time expressions can be used:

‘8:45 AM’

‘4 pm’

4. Miscellaneous Data Types

Transact-SQL supports several data types that do not belong to any of the data type groups described previously:

Binary data types
BIT
Large object data types
CURSOR (discussed in Chapter 8)
UNIQUEIDENTIFIER
SQL_VARIANT
TABLE (discussed in Chapters 5 and 8)
XML (discussed in the previous edition of this book)
Spatial (e.g., GEOGRAPHY and GEOMETRY) data types (discussed in Chapter 30)
HIERARCHYID
TIMESTAMP
User-defined data types (discussed in Chapter 5)

The following sections describe each of these data types (other than those designated as being discussed in another chapter).

4.1. Binary and BIT Data Types

BINARY and VARBINARY are the two binary data types. They describe data objects being represented in the internal format of the system. They are used to store bit strings. For this reason, the values are entered using hexadecimal numbers.

The values of the BIT data type are stored in a single bit. Therefore, values of up to eight 1-bit columns are stored in 1 byte. The following table summarizes the properties of these data types:

4.2. Large Object Data Types

Large objects (LOBs) are data objects with the maximum length of 2GB. These objects are generally used to store large text data and to load modules and audio/video files. Transact-SQL supports the following LOB data types:

VARCHAR(max)
NVARCHAR(max)
VARBINARY(max)

You can use the same programming model to access values of standard data types and LOBs. In other words, you can use convenient system functions and string operators to work with LOBs.

The Database Engine uses the max parameter with the data types VARCHAR, NVARCHAR, and VARBINARY to define variable-length columns. When you use max by default (instead of an explicit value), the system analyzes the length of the particular string and decides whether to store the string as a convenient value or as a LOB. The max parameter indicates that the size of column values can reach the maximum LOB size of the current system.

Although the Database Engine decides how a LOB will be stored, you can override this default specification using the sp_tableoption system procedure with the LARGE_VALUE_ TYPES_OUT_OF_ROW option. If the option’s value is set to 1, the data in columns declared using the max parameter will be stored separately from all other data. If this option is set to 0, the Database Engine stores all values for the row size < 8000 bytes as regular row data.

You can apply the FILESTREAM attribute to a VARBINARY(max) column to store large binary data directly in the NTFS file system. The main advantage of this attribute is that the size of the corresponding LOB is limited only by the file system volume size. (This storage attribute will be described in the upcoming “Storage Options” section.)

4.3. UNIQUEIDENTIFIER Data Type

As its name implies, a value of the UNIQUEIDENTIFIER data type is a unique identification number stored as a 16-byte binary string. This data type is closely related to the globally unique identifier (GUID), which guarantees uniqueness worldwide. Hence, using this data type, you can uniquely identify data and objects in distributed systems.

The initialization of a column or a variable of the UNIQUEIDENTIFIER type can be provided using the functions NEWID and NEWSEQUENTIALID, as well as with a string constant written in a special form using hexadecimal digits and hyphens. (The functions NEWID and NEWSEQUENTIALID are described in the section “System Functions” later in this chapter.)

A column of the UNIQUEIDENTIFIER data type can be referenced using the keyword ROWGUIDCOL in a query to specify that the column contains ID values. (This keyword does not generate any values.) A table can have several columns of the UNIQUEIDENTIFIER type, but only one of them can have the ROWGUIDCOL column attribute.

4.4. SQL_VARIANT Data Type

The SQL_VARIANT data type can be used to store values of various data types at the same time, such as numeric values, strings, and date values. (The only types of values that cannot be stored are TIMESTAMP values.) Each value of an SQL_VARIANT column has two parts: the data value and the information that describes the value. (This information contains all properties of the actual data type of the value, such as length, scale, and precision.)

Transact-SQL supports the SQL_VARIANT_PROPERTY function, which displays the attached information for each value of an SQL_VARIANT column. For the use of the SQL VARIANT data type, see Example 5.5 in Chapter 5.

NOTE Declare a column of a table using the SQL_VARIANT data type only if it is really necessary. A column should have this data type if its values may be of different types or if determining the type of a column during the database design process is not possible.

4.5. HIERARCHYID Data Type

The HIERARCHYID data type is used to store an entire hierarchy. (For instance, you can use this data type to store a hierarchy of all employees or a hierarchy of all folder lists.) It is implemented as a Common Language Runtime (CLR) user-defined type that comprises several system functions for creating and operating on hierarchy nodes. The following functions, among others, belong to the methods of this data type: GetLevel(), GetAncestor(), GetDescendant(), Read(), and Write(). (The detailed description of this data type is outside the scope of this book.)

4.6. TIMESTAMP Data Type

The TIMESTAMP data type specifies a column being defined as VARBINARY(8) or BINARY(8), depending on nullability of the column. The system maintains a current value (not a date or time) for each database, which it increments whenever any row with a TIMESTAMP column is inserted or updated and sets the TIMESTAMP column to that value. Thus, TIMESTAMP columns can be used to determine the relative time at which rows were last changed. (ROWVERSION is a synonym for TIMESTAMP.)

NOTE The value stored in a TIMESTAMP column isn’t important by itself. This column is usually used to detect whether a specific row has been changed since the last time it was accessed.

4.7. Storage Options

There are two different storage options, each of which allows you to store LOBs and to save storage space:

FILESTREAM
Sparse columns

The following subsections describe these options.

4.8. FILESTREAM Storage

The Database Engine supports the storage of LOBs using the VARBINARY(max) data type. The property of this data type is that binary large objects (BLOBs) are stored inside the database. This solution can cause performance problems if the stored files are very large, as in the case of video or audio files. In that case, it is common to store such files outside the database, in external files.

The FILESTREAM storage option supports the management of LOBs, which are stored in the NTFS file system. The main advantage of this type of storage is that the Database Engine is able to manage FILESTREAM data even though it is stored outside the database. Therefore, this storage type has the following properties:

You use the CREATE TABLE statement to store FILESTREAM data and use the DML statements (SELECT, INSERT, UPDATE, and DELETE) to query and update such data.
The Database Engine assures the same level of security for FILESTREAM data as for relational data stored inside the database.

The creation of FILESTREAM data will be described in detail in Chapter 5.

4.9. Sparse Columns

The aim of sparse columns as a storage option is quite different from the FILESTREAM storage support. Whereas FILESTREAM is Microsoft’s solution for the storage of LOBs outside the database, sparse columns help to minimize data storage space. These columns provide an optimized way to store column values, which are predominantly NULL. (NULL values are described at the end of this chapter.) If you use sparse columns, NULL values require no disk space, but on the other side, non-NULL data needs an additional 2 to 4 bytes, depending on the data type of the non-NULL values. For this reason, Microsoft recommends using sparse columns only when the overall storage space savings will be at least 20 percent.

You specify and access sparse columns in the same way as you specify and access all other columns of a table. This means that the statements SELECT, INSERT, UPDATE, and DELETE can be used to access sparse columns in the same way as you use them for usual columns. (These four SQL statements are described in detail in Chapters 6 and 7.) The only difference is in relation to creation of a sparse column: you use the SPARSE option (after the column name) to specify that a particular column is a sparse column: col_name data_type SPARSE.

If a table has several sparse columns, you can group them in a column set. Therefore, a column set is an alternative way to store and access all sparse columns in a table. For more information concerning column sets, see Microsoft Docs.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.