Learning Objective
The objective of this SQL Server tutorial is to teach you how to use SELF JOIN to join a table to itself.
What is SELF JOIN in SQL Server?
A SELF JOIN is a join which joins a table to itself. The purpose of such a JOIN is to compare the records in a table which is otherwise not possible. Basically, it is like creating a copy of a table so that the table and its copy can interact like two different tables and a record in the table can be compared to its duplicate and all other records in the table copy. A SELF JOIN is not a special type of JOIN but rather an INNER JOIN or a LEFT OUTER JOIN (also called LEFT JOIN) implemented on the same table. It functions in exactly the same manner with the only difference being that the logic and operation is applied to the same table and its replica.
A SELF JOIN can be created on a common column or on different columns in the same table. Since a SELF JOIN uses the same table therefore it is necessary to use at least one alias name for the table so that SQL understands and treats them as two different tables. A SELF JOIN is used to process hierarchical data, identify duplicate data and sequence data.
SQL Server SELF JOIN Syntax
The basic syntax of SELF JOIN can be any one of the following depending upon whether we use an INNER JOIN or a LEFT JOIN for the purpose.
SELECT column_list FROM table1 alias1 INNER JOIN table1 alias2 ON alias1.columnX=alias2.columnX;
SELECT column_list FROM table1 alias1 LEFT JOIN table1 alias2 ON alias1.columnX=alias2.columnX;
In this syntax,
- column_list – the list of columns specified from the aliased tables in the SELECT statement.
- table1 – the table on which the statement operates.
- alias1/alias2 – alias names for the table to distinguish them as separate entities.
- INNER JOIN/JOIN – SQL keyword combination to implement an INNER JOIN (where the resultset is the intersecting rows of the participating tables).
- columnX – common column on which the JOIIN is made.
SQL Server SELF JOIN with Examples
Let us understand the practical application of SELF JOIN with the help of some examples. We will use a sample table called emp represented below. The table contains the name of employees mapped to the id of their manager and the employee’s salary.
Id | Name | MgrId | Salary |
1 | Kirsten Rose | 11 | 52000 |
2 | Julia Wells | NULL | 75000 |
13 | Korbin Miles | 11 | 51000 |
4 | Robert Fox | 11 | 50000 |
5 | Steven Pitt | 12 | 55000 |
8 | Jane Frost | 12 | 55000 |
9 | Simon Ray | 2 | 60000 |
10 | Sandra Bull | 2 | 66000 |
11 | Albert Spencer | NULL | 76000 |
12 | Raymond Cage | NULL | 71000 |
1) SQL Server SELF JOIN – hierarchical data processing with INNER JOIN
In the above table there is a hierarchical relationship between employee and manager. The MgrId column of the table is a subset of the Id column and contains data values which are common to the Id column data values. We can use a SELF JOIN to identify and report this hierarchical relationship. The following query does so by creating a SELF JOIN on the table using INNER JOIN on two different table columns (i.e. MgrId and Id columns as mentioned above). The table is aliased first as e (i.e. the left table) and then as m (i.e. the right table) and the output is sorted by the column alias ‘Manager Name’ to group the output for easy readability.
SELECT e.name 'Employee Name', m.name 'Manager Name' FROM emp e INNER join emp m ON e.mgrid = m.id ORDER BY 'Manager Name';
It will produce the following output where every employee name is listed with his/her manager name. However, if we examine closely, we will see that the 3 employees (i.e. those who are managers and self-managed are missing from the list). That is because an INNER JOIN only reports intersections i.e. where the JOIN condition is met or where the joined column values match in both tables).
Employee Name | Manager Name |
Kirsten Rose | Albert Spencer |
Korbin Miles | Albert Spencer |
Robert Fox | Albert Spencer |
Simon Ray | Julia Wells |
Sandra Bull | Julia Wells |
Steven Pitt | Raymond Cage |
Jane Frost | Raymond Cage |
2) SQL Server SELF JOIN – hierarchical data processing with LEFT JOIN
The above shortcoming can be overcome and an exhaustive list generated by using a LEFT JOIN which will list all records from the left column including those which do not match the join condition in the right table. The following query does do using a LEFT JOIN and the same JOIN condition and alias names and sorting as above.
SELECT e.name 'Employee Name', m.name 'Manager Name' FROM emp e LEFT join emp m ON e.mgrid = m.id ORDER BY 'Manager Name';
It will produce the following output where every employee name is listed with his/her manager name including those who are self-managed and not reporting to anybody. The Manager Name value for them is NULL as is the normal behavior of LEFT JOIN.
Employee Name | Manager Name |
Julia Wells | NULL |
Albert Spencer | NULL |
Raymond Cage | NULL |
Korbin Miles | Albert Spencer |
Robert Fox | Albert Spencer |
Kirsten Rose | Albert Spencer |
Simon Ray | Julia Wells |
Sandra Bull | Julia Wells |
Steven Pitt | Raymond Cage |
Jane Frost | Raymond Cage |
3) SQL Server SELF JOIN – identifying and reporting duplicate data
As noted above while describing SELF JOIN it is also possible to identify and report duplicate data (i.e. multiple records having the same column value for one or more column) using SELF JOIN. The following query does the same and identifies and lists those employees who have the same salary. The query creates a SELF JOIN on the table using INNER JOIN on the same table column (i.e. Id column) but on a not equal to condition. The table is aliased first as e (i.e. the left table) and then as ee (i.e. the right table). For records where the JOIN condition is satisfied (i.e. where the Id of the left table e is different from the id of the right table ee) it checks the WHERE condition and if satisfied includes the record in the resultset.
SELECT e.name Employee Name, e.salary Salary FROM emp e INNER join emp ee ON ee.id != e.id WHERE ee.salary=e.salary;
It will produce the following output which lists two employees who have the same salary.
Employee Name | Salary |
Steven Pitt | 55000 |
Jane Frost | 55000 |