User-Defined Functions in SQL Server

In programming languages, there are generally two types of routines:

  • Stored procedures
  • User-defined functions (UDFs)

As discussed in the previous major section of this chapter, stored procedures are made up of several statements that have zero or more input parameters but usually do not return any output parameters. In contrast, functions always have one return value. This section describes the types of UDFs and the creation and use of UDFs.

1. Types of User-Defined Functions

User-defined functions can be

  • Scalar
  • Table-valued

The return value of scalar functions is always a single value. Scalar functions are generally used within a query. Also, scalar functions can be called using the EXECUTE statement, the same way as stored procedures. (An example of a scalar UDF is given in the upcoming “Scalar UDF Inlining” section.)

A table-valued function is a user-defined function that returns data of a table type. The return type of a table-valued function is a table. For this reason, you can use the table-valued function just like you would use a table. (A detailed description of these functions is given in the upcoming “Table-Valued Functions” section.)

2. Creation and Execution of User-Defined Functions

UDFs are created with the CREATE FUNCTION statement, which has the following syntax:

CREATE FUNCTION [schema_name.]function_name

[({@param } type [= default]) {,…}

RETURNS {scalar_type | [@variable] TABLE}

[WITH {ENCRYPTION | SCHEMABINDING}

[AS] {block | RETURN (select_statement)}

schema_name is the name of the schema to which the ownership of the created UDF is assigned. function_name is the name of the new function. @param is an input parameter, while type specifies its data type. Parameters are values passed from the caller of the UDF and are used within the function. default specifies the optional default value of the corresponding parameter. (Default can also be NULL.)

The RETURNS clause defines a data type of the value returned by the UDF. This data type can be any of the standard data types supported by the database system, including the TABLE data type. (The only standard data type that you cannot use is TIMESTAMP.)

UDFs are either scalar-valued or table-valued. A scalar-valued function returns an atomic (scalar) value. This means that in the RETURNS clause of a scalar-valued function, you specify one of the standard data types. Functions are table-valued if the RETURNS clause returns a set of rows (see the next subsection).

The WITH ENCRYPTION option encrypts the information in the system catalog that contains the text of the CREATE FUNCTION statement. In that case, you cannot view the text used to create the function. (Use this option to enhance the security of your database system.)

The alternative clause, WITH SCHEMABINDING, binds the UDF to the database objects that it references. Any attempt to modify the structure of the database object that the function references fails. (The binding of the function to the database objects it references is removed only when the function is altered, so the SCHEMABINDING option is no longer specified.)

Database objects that are referenced by a function must fulfill the following conditions if you want to use the SCHEMABINDING clause during the creation of that function:

  • All views and UDFs referenced by the function must be schema-bound.
  • All database objects (tables, views, or UDFs) must be in the same database as the function.

block is the BEGIN/END block that contains the implementation of the function. The final statement of the block must be a RETURN statement with an argument. (The value of the argument is the value returned by the function.) In the body of a BEGIN/END block, only the following statements are allowed:

  • Assignment statements such as SET
  • Control-of-flow statements such as WHILE and IF
  • DECLARE statements defining local data variables
  • SELECT statements containing SELECT lists with expressions that assign to variables that are local to the function
  • INSERT, UPDATE, and DELETE statements modifying variables of the TABLE data type that are local to the function

By default, only the members of the sysadmin fixed server role and the db_owner and db_ddladmin fixed database roles can use the CREATE FUNCTION statement. However, the members of these roles may assign this privilege to other users by using the GRANT CREATE FUNCTION statement (see Chapter 12).

Example 8.12 shows the creation of the scalar user-defined function called compute_costs.

Example 8.12

— This function computes additional total costs that arise

— if budgets of projects increase USE sample;

GO

CREATE FUNCTION compute_costs (@percent INT =10)

RETURNS DECIMAL(16,2)

BEGIN

DECLARE @additional_costs DEC (14,2), @sum_budget dec(16,2)

SELECT @sum_budget = SUM (budget) FROM project

SET @additional_costs = @sum_budget * @percent/100

RETURN @additional_costs

END

The compute_costs function computes additional costs that arise when all budgets of projects increase. The single input variable, @percent, specifies the percentage of increase of budgets. The BEGIN/END block first declares two local variables: @additional_costs and @sum_budget. The function then assigns to @sum_budget the sum of all budgets, using the SELECT statement. After that, the function computes total additional costs and returns this value using the RETURN statement.

2.1. Invoking User-Defined Functions

Each UDF can be invoked in Transact-SQL statements, such as SELECT, INSERT, UPDATE, or DELETE. To invoke a function, specify the name of it, followed by parentheses. Within the parentheses, you can specify one or more arguments. (Arguments are values or expressions that are passed to the input parameters that are defined immediately after the function name.) When you invoke a function, and all parameters have no default values, you must supply argument values for all of the parameters and you must specify the argument values in the same sequence in which the parameters are defined in the CREATE FUNCTION statement. (As you already know, the scalar UDFs can be invoked using the EXEC statement, too.)

Example 8.13 shows the use of the compute_costs function (Example 8.12) in a SELECT statement.

Example 8.13

USE sample;

SELECT project_no, project_name

FROM project

WHERE budget < dbo.compute_costs(25)

The result is

The SELECT statement in Example 8.13 displays names and numbers of all projects where the budget is lower than the total additional costs of all projects for a given percentage.

NOTE Each function used in a Transact-SQL statement must be specified using its two-part name— that is, schema_name.function_name.

2.2. Scalar UDF Inlining

In all versions prior to SQL Server 2019, scalar user-defined functions are generally a performance issue because they are generally processed in a row-oriented way, meaning that they run for every returned row. That way, they do not allow parallel execution of rows.

SQL Server 2019 supports a new feature called scalar UDF inlining, the goal of which is to improve performance of queries that invoke scalar UDFs. In other words, the feature allows certain scalar UDFs to have their definitions placed directly into the query so that the query does not call the UDF when executing each row.

Scalar UDF inlining will be discussed in detail in Chapter 28 (see Example 28.14).

2.3. Table-Valued Functions

As you already know, functions are table-valued if the RETURNS clause returns a set of rows. Depending on how the body of the function is defined, table-valued functions can be classified as inline or multistatement functions. If the RETURNS clause specifies TABLE with no accompanying list of columns, the function is an inline function. Inline functions return the result set of a SELECT statement as a table variable; that is, a variable of the TABLE data type.

In other words, an inline function can contain only one root SELECT statement that is used to describe its result. This is good from an optimization prospective, because the Query Optimizer may inline a function’s text into a query and thus optimize a query as a whole.

A multistatement table-valued function includes a name followed by TABLE. (The name defines an internal variable of the type TABLE.) You can use this variable to insert rows into it and then return the variable as the return value of the function. Multistatement table-valued functions are difficult to optimize, and are not considered by the Query Optimizer.

Example 8.14 shows an inline function that returns a table variable.

Example 8.14

USE sample;

GO

CREATE FUNCTION employees_in_project (@pr_number CHAR(4))

RETURNS TABLE

AS RETURN (SELECT emp_fname, emp_lname

FROM works_on, employee

WHERE employee.emp_no = works_on.emp_no

AND project_no = @pr_number)

The employees_in_project function is used to display names of all employees that belong to a particular project. The input parameter @pr_number specifies a project number. While the function generally returns a set of rows, the RETURNS clause contains the TABLE data type. (Note that the BEGIN/END block in Example 8.14 must be omitted, while the RETURN clause contains a SELECT statement.)

Example 8.15 shows the use of the employees_in_project function.

Example 8.15

USE sample;

SELECT *

FROM employees_in_project(‘p3’)

The result is

3. Table-Valued Functions and APPLY

The APPLY operator is a relational operator that allows you to invoke a table-valued function for each row of a table expression. This operator is specified in the FROM clause of the corresponding SELECT statement in the same way as the join operator is applied. There are two forms of the APPLY operator:

  • CROSS APPLY
  • OUTER APPLY

The CROSS APPLY operator returns those rows from the inner (left) table expression that match rows in the outer (right) table expression. Therefore, the CROSS APPLY operator is logically the same as the INNER JOIN operator.

NOTE CROSS APPLY is Microsoft’s extension to the SQL standard. You can rewrite most queries with the CROSS APPLY operator using INNER JOIN, but the advantage of the former is that it can yield a better execution plan and better performance, since it can limit the set being joined, before the join occurs.

The OUTER APPLY operator returns all the rows from the inner (left) table expression.

(For the rows for which there are no corresponding matches in the outer table expression, it contains NULL values in columns of the outer table expression.) OUTER APPLY is logically equivalent to LEFT OUTER JOIN.

Examples 8.16 and 8.17 show how you can use APPLY.

Example 8.16

— generate function

CREATE FUNCTION dbo.fn_getjob(@empid AS INT)

RETURNS TABLE AS

RETURN

SELECT job

FROM works_on

WHERE emp_no = @empid

AND job IS NOT NULL AND project_no = ‘p1’;

The fn_getjob() function in Example 8.16 returns the set of rows from the works_on table. This result set is “joined” in Example 8.17 with the content of the employee table.

Example 8.17

— use CROSS APPLY

SELECT E.emp_no, emp_fname, emp_lname, job

FROM employee as E

CROSS APPLY dbo.fn_getjob(E.emp_no) AS A

— use OUTER APPLY

SELECT E.emp_no, emp_fname, emp_lname, job

FROM employee as E

OUTER APPLY dbo.fn_getjob(E.emp_no) AS A

The result is

In the first query of Example 8.17, the result set of the table-valued function fn_getjob() is “joined” with the content of the employee table using the CROSS APPLY operator. fn_getjob() acts as the right input, and the employee table acts as the left input. The right input is evaluated for each row from the left input, and the rows produced are combined for the final output.

The second query is similar to the first one, but uses OUTER APPLY, which corresponds to the outer join operation of two tables.

4. Table-Valued Parameters

Sometimes it is necessary to send many parameters to a routine. One way to do it is to use a temporary table, insert the values into it, and then call the routine. The better way is to use table-valued parameters (see Example 8.18).

Example 8.18

USE sample;

GO

CREATE TYPE departmentType AS TABLE

(dept_no CHAR(4),dept_name CHAR(25),location CHAR(30));

GO

CREATE TABLE #dallasTable

(dept_no CHAR(4),dept_name CHAR(25),location CHAR(30));

GO

CREATE PROCEDURE insertProc

@Dallas departmentType READONLY

AS SET NOCOUNT ON

INSERT INTO #dallasTable (dept_no, dept_name, location)

SELECT * FROM @Dallas

GO

DECLARE @Dallas AS departmentType;

INSERT INTO @Dallas( dept_no, dept_name, location)

SELECT * FROM department

WHERE location = ‘Dallas’

EXEC insertProc @Dallas;

Example 8.18 first defines the type called departmentType as a table. This means that its type is the TABLE data type, so rows can be inserted in it. The second statement creates the temporary table called #dallasTable. In the insertProc procedure, the @Dallas variable, which is of the departmentType type, is specified. (The READONLY clause specifies that the content of the table variable cannot be modified.) In the subsequent batch, data is added to the table variable, and after that the procedure is executed. The procedure, when executed, inserts rows from the table variable into the temporary table #dallasTable. The content of the temporary table is as follows:

The use of table-valued parameters gives you the following benefits:

  • It simplifies the programming model in relation to routines.
  • It reduces the round trips to the server.
  • The resulting table can have different numbers of rows.

5. Changing the Structure of UDFs

The Transact-SQL language also supports the ALTER FUNCTION statement, which modifies the structure of a UDF. This statement is usually used to remove the schema binding. All options of the ALTER FUNCTION statement correspond to the options with the same name in the CREATE FUNCTION statement.

A UDF is removed using the DROP FUNCTION statement. Only the owner of the function (or the members of the db_owner and sysadmin fixed database roles) can remove the function.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *