Learning Objective
The objective of this tutorial is to teach you how to extract data using the GROUP BY clause from a table.
What is GROUP BY clause?
A table in a database has columns of information in it. Each column in a table represents an attribute which has values and sometimes the same value. The group by clause is used to identify rows which have the same value for the specified attribute (i.e. duplicate value) and return a single row of information instead of all the rows where the attribute has the same value.
The GROUP BY clause can be used on one or more columns.
SQL Server GROUP BY Syntax
The generic syntax of SQL Server GROUP by clause is as below.
SELECT expressions FROM tables [WHERE conditions] GROUP BY column_name1, column_name2 ,… [ORDER BY column_name1, column_name2 ,… ASC | DESC]
In this Syntax,
- expressions – expressions defined here the column(s) or calculation you want to retrieve. If you want to retrieve all the columns simply use * in the place of expressions.
- tables – one or more than one table from where you want to retrieve data.
- WHERE conditions – Optional. This is used to specify some conditions while selecting data. In case you are not using WHERE clause all the rows available will be selected.
- GROUP BY – GROUP BY along with the arguments defines the group by the columns that you specify in the GROUP BY clause.
- ORDER BY – Optional. This argument is used to sort the resultset. If you want to sort on more than one column, you need to provide them in comma-separated.
- ASC – Optional. ASC sort the resultset in ascending order. This is the default behavior if no modifier is mentioned.
- DESC – Optional. DESC sorts resultset in descending order by expression.
Note – The SQL Server GROUP BY clause must be specify after WHERE conditions if specified and must preceded by the ORDER BY clause if one is used.
SQL Server GROUP BY Examples
Let us see how it works starting from the simple (i.e. single column scenario) to the complex (i.e. multiple column scenario).
NOTE: Columns are also referred to as fields and the terms are used interchangeably.
We will use the below sample table for reference and example.
1) SQL Server GROUP BY Example – Grouping on single column
The below query will find the number of employees worked from the different location.
SELECT emp_location, count(emp_id) 'no of employees' FROM employee GROUP BY emp_location;
Output
In the above example, we have applied a GROUP BY on the column emp_location and count aggregate function to find out the number of employees belongs to different locations.
2) SQL Server GROUP BY Example – Grouping on multiple columns
Now lets take a look on the usefulness of the GROUP BY on multiple columns.
Suppose we want to find the number of employees in each department on different locations. To achieve this we have to perform GROUP BY operation on both dept_code and emp_location column.
SELECT dept_code, emp_location, count(emp_id) 'no of employees' FROM employee GROUP BY dept_code, emp_location ORDER BY emp_location, dept_code;
Output
3) SQL Server GROUP BY clause and aggregate functions
In real world scenario, most of the time group by are used along with aggregate function to prepare the report.
For example, management wants to know the total number of sales happened in a particular month for a specific product. In this case, count() aggregate function along with the select list.
An aggregate function is used to perform calculation on a group and returns a unique value per group. There are different kind of aggregate function is available in SQL Server. For example, SUM() which is used to get the sum of the specified column. Other commonly used aggregate functions are COUNT(), MIN() (minimum), MAX() (maximum), AVG() (average).
The GROUP BY clause in SQL Server groups the rows on specified column and an aggregate function returns the summary (count, sum, min, max,average etc.) for each group.
For example, the following query returns the number of employees in each department.
SELECT dept_code, count(*) FROM employee GROUP BY dept_code;
If you are specifying more columns in the select expression but not specified in the aggregate function then those column names should be mentioned in the GROUP BY clause. Otherwise the query will produce an error because there is no grantee that those columns will return unique value to the group. For example, the below query will fail.
SELECT emp_gender, dept_code, count(*) FROM employee GROUP BY dept_code;
Msg 8120, Level 16, State 1, Line 2 Column 'employee.emp_gender' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.