Welcome to the world of window functions.
A window function is closely related to an aggregate function. However, instead of collapsing all rows into one, we keep them all. A new column is then added with a running total, ranking, or moving average. This is our ‘window frame.’
If you’re interested in using this powerful tool, keep reading for examples and pictures.
Window functions come in three main types. These are:
Aggregate window functions
These use aggregate functions like SUM, COUNT, MAX, MIN over a set of rows and return a single result from the query.
Ranking window functions
These assign a ‘rank’ to a set of rows and using RANK, DENSE_RANK, ROW_NUMBER, NTILE
Value window functions
These use LAG, LEAD, FIRST_VALUE, LAST_VALUE to access a previous row without having to do a self join.
Let’s start off by exploring aggregate window functions and how they work. Using a few real-life examples, I will simplify the syntax and explain the use cases.
You will need:
- The function you want to perform: AVG, SUM, COUNT
- An indication you want to use this function over multiple rows: OVER
- How you want to group your rows – PARTITION
- How you want to order your rows – ORDER BY
Setting the Scene
In this case, the sales manager sent us a request. In order to set her team’s targets for next year, she needs historic data. The data is in the sales database, but we need it in a format that is easier to use than the raw table.
How to create a running total
She would like to be able to see daily sales totals as well as individual order IDs so she can ‘drill down’ as needed.
We could use SUM to total all the rows in the orders table. However, this collapses the order details. A window function will allow the Sales Manager to see each order with a running total.
By using a window function we can see each order for each day and its total, with a running total along the side. Other aggregate functions work the same way, so you can use COUNT, AVG, MIN or MAX, or in combination.
select sale_date, salesorderid, subtotal, sum(subtotal) over(partition by sale_date order by salesorderid) as total_sales from sales.salesorderheader where orderdate between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000' order by sale_date
How to RANK rows based on a given criteria
The sales manager has returned.
Although she was happy with the table created for setting targets, she needs strategies for increasing sales. She wants to see the sales by customer and dollar value for 2018. Perhaps she could go knock on their door again if she can remember when their big sales were in the past?
Using RANK() in a window function ranks each row, in this case, by subtotal. The sales manager can now decide if she wants to target those customers again in the coming year.
select sale_date, salesorderid, subtotal, rank() over(order by subtotal desc) as sales_rank from sales.salesorderheader where orderdate between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000' order by sale_date
For rows with identical numbers, as in the example above, the two and three positions are skipped. If you use DENSE_RANK, then you would keep the second and third positions before moving on to the fourth.
Check out part two to learn even more and to see how you can use window functions to your advantage.