Learning Objective
The objective of this SQL Server tutorial is to teach you what is a computed column and why and how to create an index on a computed column.
What is a Computed Column?
A Computed Column is a column whose values are derived or computed by an expression. The expression might be a constant, function or combination of values from one or more columns in the same table. The need for such a column arises when we need more information or insight about a particular piece of data in a table but which does not exist in the table as a column. One such example is total price which is a product of unit price and quantity of a commodity. The unit price and quantity values are available in the table but the total price which is variable and dependent is not. Another example might be an employee table containing the employee’s salary and tax slab to which the employee belongs but not the exact tax amount payable per month as part of salary deduction. In such cases computed columns come to the rescue. There can be one or more Computed Columns defined on a table.
A computed column is of 2 types – PERSISTED and VIRTUAL (or NOT PERSISTED).
A Persisted Computed Column is permanently physically stored as an additional column in the table. It is more efficient of the two as the column is always available in the actual table but requires additional disk space for storage.
The other is when the Computed Column exists as a virtual or logical entity and its values are computed dynamically at run time when it is referred to by a query. This saves disk space but slows query processing as the Computed Column has to generated every time it is referred to in a query. If you want to create a Computed Column as PERSISTED you have to precede the column name with the PERSISTED keyword. Otherwise SQL Server will create it as a virtual Computed Column i.e. not PERSISTED. If you want to add a Computed Column to an already existing table you can do so with ALTER TABLE statement and using or skipping the PERSISTED keyword as per your requirement.
What is an Index on Computed Column?
An Index on a Computed Column is simply an index which one or more Computed Columns as it’s key. Or to put it otherwise it is indexing a Computed Column(s) to enable faster search and retrieval of values from the column(s). However, to be able to create an index on a Computed Column, the Computed Column must satisfy the below 2 conditions.
- Determinism – Determinism means that the column values of a computed column should be fixed and not change with time. The expression for computation should always produce the same value. An example is that the computed tax payable per month for an employee will be the same always until the employee’s salary changes and a different slab becomes applicable or the government changes the tax rate for the employees’ current slab.
- Precision – Precision means that the datatype of the Computed Column or the table columns from which it is derived cannot have FLOAT or REAL numeric datatypes. If the data type of the Computed Column is numeric it has to be a whole number and cannot be decimal which is an approximate value and never absolute.
If the above conditions are met, we can create a non-clustered index using one or more persisted or normal Computed Columns.
Operation
There is no special syntax to create an index on Computed Columns. It is the same CREATE INDEX statement which is used to create the other types of indexes. However, the syntax to create a Computed Column using CREATE TABLE or ALTER TABLE is different and the same is highlighted below.
SQL Server Index on Computed Columns Syntax
The basic syntax to create a Computed Column using CREATE TABLE is as below.
CREATE TABLE table_name
column1 datatype,
column2 datatype,
………………………………
computed_column1 AS expression,
computed_column2 AS expression PERSISTED;
The basic syntax to create a Computed Column using ALTER TABLE is as below.
ALTER TABLE table_name computed_column1 AS expression, computed_column2 AS expression PERSISTED;
In this syntax,
- expression – the expression to compute the Computed Column values.
- PERSISTED – optional keyword to create a PERSISTED Computed Column. If skipped SQL Server will create a virtual Computed Column. Please note that on the demo syntax above I have cited both cases.
SQL Server Index on Computed Columns Example
Let us consider a table called employee which contains the basic information of employees. The table is represented below. The emp_id
column is the primary key column in the table with an associated default clustered index by the name PK_employee
.
emp_id | emp_name | emp_sex | emp_dob |
1 | David Jackson | Male | 8/8/1998 |
2 | Jim Jameson | Female | 11/26/1998 |
3 | Kate Johnson | Female | 1/21/1987 |
4 | Will Ray | Male | 9/19/1989 |
5 | Shane Mathews | Female | 10/13/1987 |
6 | Shed Price | Male | 2/3/1987 |
7 | Viktor Smith | Male | 2/22/2000 |
8 | Vincent Smithson | Female | 9/15/1999 |
9 | Janice Streep | Female | 12/29/2000 |
10 | Laura Wells | Female | 1/1/2000 |
11 | Mac Bull | Male | 5/24/1996 |
12 | Patrick Patterson | Male | 1/7/1999 |
13 | Julie Orbison | Female | 4/24/1988 |
14 | Elice Hemingway | Female | 7/27/1988 |
15 | Wayne Johnson | Male | 3/8/1997 |
Now suppose HR wants the birth year of the employees for some reason. We can do so using the year ()
function on the emp_DOB
column values as below.
Query
SELECT emp_id, year(emp_dob) AS emp_birth_year FROM employee WHERE emp_birth_year=1999;
If we display the estimated execution plan for the query it will be as below. From the query plan we can see that SQL Server is doing an index scan of the default clustered index which is costly both in terms of time and computing resource. This is because there is no Computed Column or an index on it.
Query Execution Plan
So, we add a PERSISTED Computed Column called emp_birth_year
to the employee table.
Adding the Computed Column
The below ALTER TABLE statement does the same.
ALTER TABLE employee
ADD emp_birth_year AS year(emp_dob);
Now we can specify the Computed Column directly in the SELECT query as below.
SELECT emp_id, emp_birth_year
FROM employee
WHERE emp_birth_year=1999;
But this will not change the query processing since we still do not have any index on it. The only way for SQL Server to resolve the query is by scanning the default clustered index.
So now we will create a nonclustered index on the Computed Column to improve the query processing and replace inefficient index scan with efficient index seek.
Creating Index on the Computed Column
The below CREATE INDEX statement does the same.
Using T-SQL
CREATE NONCLUSTERED INDEX ix_cc_yob
ON employee(emp_dob);
The above index can also be created from the SSMS (SQL Server Management Studio) GUI in the following steps.
Using SSMS
- Right click in the Index node under the employee table in Object Explorer and select New Index – > Non-Clustered Index.
2. In the General tab of the New Index dialog box that will open, type in the name of the index in the Index Name textbox. Then go to the Index Key Columns tab in the middle of the dialog and add Computed Column using the Add button on the right as shown in the figure below.
3. Once added it will show in the main window as seen below. Then click OK on all dialogs to create the new index and it will be created.
Checking the new Index
You can check the new index using below T-SQL command or from the SSMS GUI as follows.
Using T-SQL
EXEC SP_HELPINDEX customer;
index_name | index_description | index_keys |
ix_cc_yob | nonclustered located on PRIMARY | emp_birth_year |
PK_employee | clustered, unique, primary key located on PRIMARY | emp_id |
Using SSMS
Checking the New Index in Action
With the index in place if we generate the estimated query plan for the above query, we can see the difference. The estimated query execution plan and the query output are shown below. We can see that the query processor is now doing an index seek on the newly created nonclustered index. So, our purpose is achieved.
Query Execution Plan
Query Resultset
emp_id | emp_birth_year |
8 | 1999 |
12 | 1999 |