SQL Database Updates: Adding Data to the Database

A new row of data is typically added to a relational database when a new entity represented by the row appears in the outside world. For example, in the sample database:

  • When you hire a new salesperson, a new row must be added to the SALESREPS table to store the salesperson’s data.
  • When a salesperson signs a new customer, a new row must be added to the CUSTOMERS table, representing the new customer.
  • When a customer places an order, a new row must be added to the ORDERS table to contain the order data.

In each case, the new row is added to maintain the database as an accurate model of the real world. The smallest unit of data that can be added to a relational database is a single row. In general, a SQL-based DBMS provides three ways to add new rows of data to a database:

  • Single-row INSERT. A single-row INSERT statement adds a single new row of data to a table. It is commonly used in daily applications—for example, data entry programs.
  • Multi -row INSERT. A multirow INSERT statement extracts rows of data from another part of the database and adds them to a table. It is commonly used in end-of-month or end-of-year processing when old rows of a table are moved to an inactive table.
  • Bulk load. A bulk load utility adds data to a table from a file that is outside of the database. It is commonly used to initially load the database or to incorporate data downloaded from another computer system or collected from many sites.

1. The Single-Row insert Statement

The single-row INSERT statement, shown in Figure 10-1, adds a new row to a table. The INTO clause specifies the table that receives the new row (the target table), and the VALUES clause specifies the data values that the new row will contain. The column list indicates which data value goes into which column of the new row.

Suppose you just hired a new salesperson, Henry Jacobsen, with the following personal data:

Name:              Henry Jacobsen

Age:               36

Employee Number:   111

Title:             Sales Manager

Office:            Atlanta (office number 13)

Hire Date:         July 25, 1990

Quota:             Not yet assigned

Year-to-Date Sales: $0.00

Here is the INSERT statement that adds Mr. Jacobsen to the sample database:

Add Henry Jacobsen as a new salesperson.

INSERT INTO SALESREPS (NAME, AGE, EMPL_NUM, SALES, TITLE,

HIRE_DATE, REP_OFFICE)

VALUES (‘Henry Jacobsen’, 36, 111, 0.00, ‘Sales Mgr’,
        ’25-JUL-90′, 13)

1 row inserted.

Figure 10-2 graphically illustrates how SQL carries out this INSERT statement. Conceptually, the INSERT statement builds a single row of data that matches the column structure of the table, fills it with the data from the VALUES clause, and then adds the new row to the table. The rows of a table are unordered, so there is no notion of inserting the row at the top, at the bottom, or between two rows of the table. After the INSERT statement, the new row is simply a part of the table. A subsequent query against the SALESREPS table will include the new row, but it may appear anywhere among the rows of query results.

Suppose Mr. Jacobsen now receives his first order, from InterCorp, a new customer who is assigned customer number 2126. The order is for 20 ACI-41004 widgets, for a total price of $2340, and has been assigned order number 113069. Here are the INSERT statements that add the new customer and the order to the database:

Insert a new customer and order for Mr. Jacobsen.

INSERT INTO CUSTOMERS (COMPANY, CUST_NUM

VALUES (‘InterCorp’, 2126, 15000.00

1 row inserted.

INSERT INTO ORDERS (AMOUNT, MFR, PRODUCT, QTY, ORDER_DATE, ORDER_NUM, CUST, REP)

VALUES (2340.00, ‘ACI’, ‘41004’, 20, CURRENT_DATE, 113069, 2126, 111)

1 row inserted.

As this example shows, the INSERT statement can become lengthy if there are many columns of data, but its format is still very straightforward. The second INSERT statement uses the system constant CURRENTDATE in its VALUES clause, causing the current date to be inserted as the order date. This system constant is specified in the SQL2 standard and is supported by many of the popular SQL products. Other brands of DBMS provide other system constants or built-in functions to obtain the current date and time.

You can use the INSERT statement with interactive SQL to add rows to a table that grows very rarely, such as the OFFICES table. In practice, however, data about a new customer, order, or salesperson is almost always added to a database through a forms- oriented data entry program. When the data entry is complete, the application program inserts the new row of data using programmatic SQL. Regardless of whether interactive or programmatic SQL is used, however, the INSERT statement is the same.

The table name specified in the INSERT statement is normally an unqualified table name, specifying a table that you own. To insert data into a table owned by another user, you can specify a qualified table name. Of course, you must also have permission to insert data into the table, or the INSERT statement will fail. The SQL security scheme and permissions are described in Chapter 15.

The purpose of the column list in the INSERT statement is to match the data values in the VALUES clause with the columns that are to receive them. The list of values and the list of columns must both contain the same number of items, and the data type of each value must be compatible with the data type of the corresponding column, or an error will occur. The ANSI/ISO standard mandates unqualified column names in the column list, but many implementations allow qualified names. Of course, there can be no ambiguity in the column names anyway, because they must all reference columns of the target table.

1.1. Inserting null Values

When SQL inserts a new row of data into a table, it automatically assigns a NULL value to any column whose name is missing from the column list in the INSERT statement. In this INSERT statement, which added Mr. Jacobsen to the SALESREPS table, the QUOTA and MANAGER columns were omitted:

INSERT INTO SALESREPS (NAME, AGE, EMPL_NUM, SALES, TITLE,

HIRE_DATE, REP_OFFICE)

VALUES (‘Henry Jacobsen’, 36, 111, 0.00, ‘Sales Mgr’,
        ’25-JUL-90′, 13)

As a result, the newly added row has a NULL value in the QUOTA and MANAGER columns, as shown in Figure 10-2. You can make the assignment of a NULL value more explicit by including these columns in the column list and specifying the keyword NULL in the values list. This INSERT statement has exactly the same effect as the previous one:

 I INSERT INTO SALESREPS (NAME, AGE, EMPL_NUM, SALES, QUOTA, TITLE,

MANAGER, HIRE_DATE, REP_OFFICE)

VALUES (‘Henry Jacobsen’, 36, 111, 0.00, NULL, ‘Sales Mgr’,

NULL, ’25-JUL-90′, 13)

1.2. Inserting All Columns

As a convenience, SQL allows you to omit the column list from the INSERT statement. When the column list is omitted, SQL automatically generates a column list consisting of all columns of the table, in left-to-right sequence. This is the same column sequence generated by SQL when you use a SELECT * query. Using this shortcut, the previous INSERT statement could be rewritten equivalently as:

 INSERT INTO SALESREPS

VALUES (111, ‘Henry Jacobsen’, 36, 13, ‘Sales Mgr’,

’25-JUL-90′, NULL, NULL, 0.00)

When you omit the column list, the NULL keyword must be used in the values list to explicitly assign NULL values to columns, as shown in the example. In addition, the sequence of data values must correspond exactly to the sequence of columns in the table.

Omitting the column list is convenient in interactive SQL because it reduces the length of the INSERT statement you must type. For programmatic SQL, the column list should always be specified because it makes the program easier to read and understand. In addition, table structures often change over time to include new columns or drop columns that are no longer used. A program that contains an INSERT statement without an explicit column list may work correctly for months or years, and then suddenly begin producing errors if the number of columns or data types of columns is changed by a database administrator.

2. The Multirow insert Statement

The second form of the INSERT statement, shown in Figure 10-3, adds multiple rows of data to its target table. In this form of the INSERT statement, the data values for the new rows are not explicitly specified within the statement text. Instead, the source of new rows is a database query, specified in the statement.

Adding rows whose values come from within the database itself may seem strange at first, but it’s very useful in some special situations. For example, suppose you want to copy the order number, date, and amount of all orders placed before January 1, 1990, from the ORDERS table into another table, called OLDORDERS. The multirow INSERT statement provides a concise, efficient way to copy the data:

Copy old orders into the OLDORDERS table.

INSERT INTO OLDORDERS (ORDER_NUM, ORDER_DATE, AMOUNT)

SELECT ORDER_NUM, ORDER_DATE, AMOUNT FROM ORDERS

WHERE ORDER DATE < ’01-JAN-90′

9 rows inserted.

This INSERT statement looks complicated, but it’s really very simple. The statement identifies the table to receive the new rows (OLDORDERS) and the columns to receive the data, just like the single-row INSERT statement. The remainder of the statement is a query that retrieves data from the ORDERS table. Figure 10-4 graphically illustrates the operation of this INSERT statement. Conceptually, SQL first performs the query against the ORDERS table and then inserts the query results, row by row, into the OLDORDERS table.

Here’s another situation where you could use the multirow INSERT statement. Suppose you want to analyze customer buying patterns by looking at which customers and salespeople are responsible for big orders—those over $15,000. The queries that you will be running will combine data from the CUSTOMERS, SALESREPS, and ORDERS tables. These three-table queries will execute fairly quickly on the small sample database, but in a real corporate database with many thousands of rows, they would take a long time. Rather than running many long, three-table queries, you could create a new table named BIGORDERS to contain the required data, defined as follows:

Column            Information

AMOUNT         Order amount (from ORDERS)
COMPANY        Customer name (from CUSTOMERS)
NAME           Salesperson name (from SALESREPS)
PERF           Amount over/under quota (calculated from SALESREPS)
MFR            Manufacturer ID (from ORDERS)
PRODUCT        Product ID (from ORDERS)
QTY            Quantity ordered (from ORDERS)

Once you have created the BIGORDERS table, this multirow INSERT statement can be used to populate it:

Load data into the BIGORDERS table for analysis.

INSERT INTO BIGORDERS (AMOUNT, COMPANY, NAME, PERF, PRODUCT, MFR, QTY)

SELECT AMOUNT, COMPANY, NAME, (SALES – QUOTA), PRODUCT, MFR, QTY

FROM ORDERS, CUSTOMERS, SALESREPS

WHERE CUST = CUST_NUM

AND REP = EMPL_NUM

AND AMOUNT > 15000.00

6 rows inserted.

In a large database, this INSERT statement may take a while to execute because it involves a three-table query. When the statement is complete, the data in the BIGORDERS table will duplicate information in other tables. In addition, the BIGORDERS table won’t be automatically kept up to date when new orders are added to the database, so its data may quickly become outdated. Each of these factors seems like a disadvantage. However, the subsequent data analysis queries against the BIGORDERS table can be expressed very simply—they become single-table queries.

Furthermore, each of those queries will run much faster than if it were a three- table join. Consequently, this is probably a good strategy for performing the analysis, especially if the three original tables are large. In this situation, it’s likely that the BIGORDERS table will be used as a temporary table for doing the analysis. It will be created and populated with data, representing a snapshot of the order status in time, the analysis programs will be run, and then the table will be emptied or dropped.

The SQL1 standard specifies several logical restrictions on the query that appears within the multirow INSERT statement:

  • The query cannot contain an ORDER BY It’s useless to sort the query results anyway, because they’re being inserted into a table that is, like all tables, unordered.
  • The query results must contain the same number of columns as the column list in the INSERT statement (or the entire target table, if the column list is omitted), and the data types must be compatible, column by column.
  • The query cannot be the UNION of several different SELECT Only a single SELECT statement may be specified.
  • The target table of the INSERT statement cannot appear in the FROM clause of the query or any subqueries that it contains. This prohibits inserting part of a table into itself.

The first two restrictions are structural, but the latter two were included in the standard simply to avoid complexity. As a result, these restrictions were relaxed in the SQL2 standard. The standard now allows UNION and JOIN operations and expressions in the query, basically allowing the results of a general database query to be retrieved and then inserted into a table with the INSERT statement. It also allows various forms of self-insertion, where the source table for the data to be inserted and the destination table are the same.

3. Bulk Load Utilities

Data to be inserted into a database is often downloaded from another computer system or collected from other sites and stored in a sequential file. To load the data into a table, you could write a program with a loop that reads each record of the file and uses the single-row INSERT statement to add the row to the table. However, the overhead of having the DBMS repeatedly execute single-row INSERT statements may be quite high. If inserting a single row takes half a second under a typical system load, that is probably acceptable performance for an interactive program. But that performance quickly becomes unacceptable when applied to the task of bulk loading 50,000 rows of data. In this case, loading the data would require over six hours.

For this reason, most commercial DBMS products include a bulk load feature that loads data from a file into a table at high speed. The ANSI/ISO SQL standard does not address this function, and it is usually provided as a stand-alone utility program rather than as part of the SQL language. Each vendor’s utility provides a slightly different set of features, functions, and commands.

When SQL is used from within an application program, another technique is frequently provided for more efficiently inserting a large amount of data into a database. The standard programmatic INSERT statement inserts a single row of data, just like the interactive single-row INSERT statements in the preceding examples. But many commercial DBMS products allow data from two or more rows (often up to hundreds of rows) to be supplied as part of a single bulk INSERT statement. All of the supplied data must be for new rows of the single table that is the target of the INSERT statement, and named in the INTO clause.

Executing a bulk INSERT statement for 100 rows of data has exactly the same effect as executing 100 individual single-row INSERT statements. However, it is usually much more efficient, because it involves only one call to the DBMS. Efficiency gains from 20 percent to 30 percent and up to 300 percent or more times over single-row INSERT statements are common, depending on the DBMS brand and the particular kind of data being inserted.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *