Learning Objective
The objective of this SQL Server tutorial is to teach you how to use a CORRELATED SUBQUERY in a SQL statement to accept input from the main query for processing and generation of final resultset.
What is CORRELATED SUBQUERY in SQL Server?
There are two types of subqueries – CORRELATED SUBQUERY and UNCORRELATED SUBQUERY (simply put a normal subquery). A correlated subquery accepts input from the main query for processing and hence is dependent upon it. That is why it is called a CORRELATED SUBQUERY i.e. a subquery related to the main query. Without the main query a CORRELATED SUBQUERY does not have a life of its own. A CORRELATED SUBQUERY executes multiple times for every row fetched from the table by the main query. That is why a CORRELATED SUBQUERY is also called a REPEATING SUBQUERY.
The differences between CORRELATED and UNCORRELATED SUBQUERY are listed below.
Pt. No. | Uncorrelated Subquery | Correlated Subquery |
1 | Subquery executes first and only once | Subquery executes after main query and executes multiple times |
2 | Subquery provides input to the main query | Subquery accepts input from the main query |
3 | Subquery resultset value or values are patched to the final resultset | Subquery resultset value or values are used for evaluation or comparison with main query row value or values as per condition specified |
4 | Subquery resultset value or values form part of the final resultset | Subquery resultset value or values do not form part of the final resultset |
5 | Efficient and faster processing since subquery executes only once | Inefficient and slower processing since subquery processes repeatedly for every row fetched by main query from the table |
Operation
A CORRELATED SUBQUERY can be part of SELECT, UPDATE and DELETE statements. A CORRELATED SUBQUERY can return one or more values and is usually used in conjunction with an aggregate function. The value or values returned is compared or evaluated against the specified outer query row value to keep or reject the outer query row (i.e. record) as a part of the final resultset.
SQL Server CORRELATED SUBQUERY Syntax
The basic syntax of a CORRELATED SUBQUERY can be presented as below.
SELECT column_list FROM table1 outer_table WHERE outer_column operator (select column_list FROM table2 inner_table WHERE inner_table.inner_column= outer_table.outer_column);
In this syntax,
- column_list – list of table columns.
- outer – table on which the main query operates.
- outer_column – column belonging to the outer table.
- operator – a comparison, logical or mathematical operator.
- inner – table on which the subquery operates. It can be the same as outer table or another table.
- inner _column – column belonging to the outer table.
1) Example 1: SQL Server CORRELATED SUBQUERY
Let us understand the same with the help of some examples. Suppose we have a table called orders containing the information of orders placed by different customers as below. We will use this table as the sample reference table for our example.
order_id | cust_name | order_date | order_value |
210 | Michael Smith | 6/11/2019 | 5500 |
211 | Michael Smith | 6/11/2019 | 5100 |
212 | Jackie Collins | 6/13/2019 | 3139 |
213 | Jackie Collins | 6/13/2019 | 3000 |
214 | Howard Spencer | 6/13/2019 | 3550 |
215 | Howard Spencer | 6/13/2019 | 3500 |
216 | Michael Smith | 6/22/2019 | 5000 |
217 | Jackie Collins | 6/22/2019 | 5500 |
218 | Jackie Collins | 6/22/2019 | 3000 |
Suppose we want to find out the highest value order placed by each customer with the resultset sorted by order value. We can do so through the following query with a CORRELATED SUBQUERY in it.
SELECT cust_name, order_id, order_value FROM orders o1 WHERE order_value IN (SELECT MAX (order_value) FROM orders o2 WHERE o2.cust_name = o1.cust_name GROUP BY cust_name) ORDER BY order_value DESC;
It will generate the following result which fulfils our requirement.
cust_name | order_id | order_value |
Michael Smith | 210 | 5500 |
Jackie Collins | 217 | 5500 |
Howard Spencer | 214 | 3550 |
Explanation: Here the outer main query executes first and fetches the mentioned column values from every row of the table one by one and passes it to the subquery. The subquery then runs in its entirety determining the highest value order for each customer with the help of the MAX aggregate function. The outer table row order value is then compared with this value. If outer table row order value matches any of the MAX values then that row is retained otherwise rejected. In this manner the query isolates the highest value customer order records for each customer which constitutes the final resultset.
2) Example 2: SQL Server CORRELATED SUBQUERY
Let us consider another example of an employees table containing the data of company employees. The table is represented below. The table has employees belonging to 3 departments identified by the id’s 221, 222, 223.
emp_id | first_name | last_name | emp_salary | dept_id |
1270 | Claudia | Crawford | 52000 | 221 |
1271 | Jeff | Goldsmith | 50000 | 221 |
1272 | Nigel | Knight | 60000 | 222 |
1273 | James | Orwell | 66000 | 222 |
1274 | Cindy | Smith | 65000 | 222 |
1275 | Niki | Bailey | 55000 | 223 |
1276 | Mike | Mattis | 69000 | 223 |
1277 | Angelia | Jackson | 59000 | 223 |
1278 | Martha | DeClarke | 55000 | 223 |
1279 | Albert | Hemingway | 55000 | 221 |
Suppose we want to retrieve the list of employees whose salary is higher than the department average sorted by employee salary. We can do so through the following query using a CORRELATED SUBQUERY in it.
SELECT emp_id, first_name, last_name, emp_salary FROM employees e1 WHERE emp_salary > (SELECT AVG (emp_salary) FROM employees WHERE dept_id = e1.dept_id) ORDER BY emp_salary DESC;
It will generate the following result which fulfils our requirement.
emp_id | first_name | last_name | emp_salary |
1276 | Mike | Mattis | 69000 |
1273 | James | Orwell | 66000 |
1274 | Cindy | Smith | 65000 |
1279 | Albert | Hemingway | 55000 |
Explanation: Just like in the previous example the outer main query executes first and fetches the mentioned column values from every row of the table one by one and passes it to the subquery. The subquery then runs in its entirety determining the average salary for each department with the help of the AVG aggregate function. The outer table row employee salary value is then compared with this value. If outer table employee salary is higher than any of the department values then that row is retained otherwise rejected. In this manner the query isolates the employee records of those employees whose salary above the department average and that constitutes the final resultset.