Creating SQL Database: Database Structure

The SQL1 standard specified a simple structure for the contents of a database, shown in Figure 13-8. Each user of the database has a collection of tables that are owned by that user. Virtually all major DBMS products support this scheme, although some (particularly those focused on special-purpose or embedded applications or personal computer usage) do not support the concept of table ownership. In these systems, all of the tables in a database are part of one large collection.

Although different brands of SQL-based database management systems provide the same structure within a single database, there is wide variation in how they organize and structure the various databases on a particular computer system. Some brands assume a single systemwide database that stores all of the data on that system. Other DBMS brands support multiple databases on a single computer, with each database identified by name. Still other DBMS brands support multiple databases within the context of the computer’s directory system.

These variations don’t change the way you use SQL to access the data within a database. However, they do affect the way you organize your data—for example, do you mix order processing and accounting data in one database, or do you divide it into two databases? They also affect the way you initially gain access to the database—for example, if there are multiple databases, you need to tell the DBMS which one you want to use. To illustrate how various DBMS brands deal with these issues, suppose the sample database were expanded to support a payroll and an accounting application, in addition to the order-processing tasks it now supports.

1. Single-Database Architecture

Figure 13-9 shows a single-database architecture where the DBMS supports one systemwide database. Mainframe and minicomputer databases (such as the mainframe version of DB2 and Oracle) have historically tended to use this approach. Order processing, accounting, and payroll data are all stored in tables within the database. The major tables for each application are gathered together and owned by a single user, who is probably the person in charge of that application on this computer.

An advantage of this architecture is that the tables in the various applications can easily reference one another. The TIMECARDS table of the payroll application, for example, can contain a foreign key that references the OFFICES table, and the applications can use that relationship to calculate commissions. With proper permission, users can run queries that combine data from the various applications.

A disadvantage of this architecture is that the database will grow huge over time as more and more applications are added to it. A DB2 or Oracle database with several hundred tables is common. The problems of managing a database of that size—performing backups, recovering data, analyzing performance, and so on—usually require a full-time database administrator.

In the single-database architecture, gaining access to the database is very simple—there’s only one database, so no choices need to be made. For example, the programmatic SQL statement that connects you to an Oracle database is CONNECT, and users tend to speak in terms of connecting to Oracle, rather than connecting to a specific database. (In fact, in this architecture, the database is usually associated with a single running copy of the DBMS software, so in a very real sense, the user is connecting to the DBMS.)

Oracle and DB2 installations frequently do run two separate databases, one for production work and one for testing. Fundamentally, however, all production data is collected into a single database.

2. Multidatabase Architecture

Figure 13-10 shows a multidatabase architecture where each database is assigned a unique name. Sybase, Microsoft SQL Server, Ingres, and many others use this scheme. As shown in the figure, each of the databases in this architecture is usually dedicated to a particular application. When you add a new application, you will probably create a new database.

The main advantage of the multidatabase architecture over the single-database architecture is that it divides the data management tasks into smaller, more manageable pieces. Each person responsible for an application can now be the database administrator of his or her own database, with less worry about overall coordination. When it’s time to add a new application, it can be developed in its own database, without disturbing the existing databases. Users and programmers are also more likely to remember the overall structure of their own databases.

The main disadvantage of the multidatabase architecture is that the individual databases may become islands of information, unconnected to one another. Typically, a table in one database cannot contain a foreign key reference to a table in a different database. Often, the DBMS does not support queries across database boundaries, making it impossible to relate data from two applications. If cross-database queries are supported, they may impose substantial overhead or require the purchase of additional distributed DBMS software from the DBMS vendor.

If a DBMS uses a multidatabase architecture and supports queries across databases, it must extend the SQL table and column naming conventions. A qualified table name must specify not only the owner of the table, but also which database contains the table. Typically, the DBMS extends the dot notation for table names by prefixing the database name to the owner name, separated by a period (.). For example, in a Sybase or SQL Server database, this table reference:

OP.JOE.OFFICES

refers to the OFFICES table owned by the user JOE in the order-processing database named OP, and the following query joins the SALESREPS table in the payroll database with that OFFICES table:

SELECT OP.JOE.OFFICES.CITY, PAYROLL.GEORGE.SALESREPS.NAME

  FROM OP.JOE.OFFICES, PAYROLL.GEORGE.SALESREPS

 WHERE OP.JOE.OFFICES.MGR = PAYROLL.GEORGE.SALESREPS.EMPL_NUM

Fortunately, such cross-database queries are the exception rather than the rule, and default database and user names can normally be used.

With a multidatabase architecture, gaining access to a database becomes slightly more complex because you must tell the DBMS which database you want to use. The DBMS’ interactive SQL program will often display a list of available databases or ask you to enter the database name along with your user name and password to gain access. For programmatic access, the DBMS generally extends the embedded SQL language with a statement that connects the program to a particular database. The Ingres form for connecting to the database named OP is:

CONNECT ‘OP’

For Sybase and Microsoft SQL Server, the parallel statement is:

USE ‘OP’

3. Multilocation Architecture

Figure 13-11 shows a multilocation architecture that supports multiple databases and uses the computer system’s directory structure to organize them. Several of the earlier minicomputer databases (including Rdb/VMS and Informix) used this scheme for supporting multiple databases. As with the multidatabase architecture, each application is typically assigned to its own database. As Figure 13-11 shows, each database has a name, but it’s possible for two different databases in two different directories to have the same name.

The major advantage of the multilocation architecture is flexibility. It is especially appropriate in applications such as engineering and design, where many sophisticated users of the computer system may all want to use several databases to structure their own information. The disadvantages of the multilocation architecture are the same as those of the multidatabase architecture. In addition, the DBMS typically doesn’t know about all of the databases that have been created, which may be spread throughout the system’s directory structure. There is no master database that keeps track of all the databases, which makes centralized database administration very difficult.

The multilocation architecture makes gaining access to a database more complex once again, because both the name of the database and its location in the directory hierarchy must be specified. The VAX SQL syntax for gaining access to an Rdb/VMS database is the DECLARE DATABASE statement. For example, this DECLARE DATABASE statement establishes a connection to the database named OP in the VAX/VMS directory named SYS$ROOT:[DEVELOPMENT.TEST] :

 DECLARE DATABASE

   FILENAME ‘SYS$ROOT:[DEVELOPMENT.TEST]OP’

If the database is in the user’s current directory (which is often the case), the statement simplifies to:

 DECLARE DATABASE

FILENAME ‘OP’

Some of the DBMS brands that use this scheme allow you to have access to several databases concurrently, even if they don’t support queries across database boundaries. Again, the most common technique used to distinguish among the multiple databases is with a superqualified table name. Since two databases in two different directories can have the same name, it’s also necessary to introduce a database alias to eliminate ambiguity. These VAX SQL statements open two different Rdb/VMS databases that happen to have the same name:

DECLARE DATABASE OP1
       FILENAME ‘SYS$ROOT:[PRODUCTION\]OP’
DECLARE DATABASE OP2
       FILENAME ‘SYS$ROOT:[DEVELOPMENT.TEST]OP’

The statements assign the aliases OP1 and OP2 to the two databases, and these aliases are used to qualify table names in subsequent VAX SQL statements.

As this discussion shows, there can be a tremendous variety in the way various DBMS brands organize their databases and provide access to them. This area of SQL is one of the most nonstandard, and yet it is often the first one that a user encounters when trying to access a database for the first time. The inconsistencies also make it impossible to transparently move programs developed for one DBMS to another, although the conversion process is usually tedious rather than complex.

4. Databases on Multiple Servers

With the rise of database servers and local area networks, the notion of database location embodied in the multilocation architecture is naturally extended to the notion of a physical database server. In practice, most DBMS products today appear to be converging on a multidatabase architecture, implemented within a physical server. At the highest level, a database is associated with a named server on the network. Within the server, there can be multiple named databases. The mapping of server names to physical server locations is handled by the networking software. The mapping of database names to physical files or file systems on a server is handled by the DBMS software.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *