Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.
 
SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.
 

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

SQL window functions – what are they and why are they needed?

22 September 2020

SQL window functions

Many developers, even those who have long been familiar with SQL, do not understand window functions, considering them some kind of special magic for the chosen ones. And although the implementation of window functions is supported with SQL Server 2005, someone is still “copying” them from StackOverflow without going into details.

With this article, we will try to debunk the myth about the impregnability of this SQL functionality and show some examples of window functions on the real dataset.

Why not GROUP BY and JOIN

Let us make it clear at once that window functions are not the same as GROUP BY. They do not reduce the number of lines but return the same values as they received at the input. Secondly, unlike GROUP BY, OVER can refer to other lines. And thirdly, they can read moving averages and cumulative sums.

Note Window functions do not change the selection, but only add some additional information about it. For ease of understanding, you can assume that SQL first executes the entire query (except sorting and limit), and then reads the window values.

Okay, GROUP BY has been dealt with. But you can almost always go several ways in SQL. For example, you may want to use subqueries or JOIN. Of course, JOIN is preferable to subqueries for performance, and the performance of JOIN and OVER constructs will be the same. But OVER gives more freedom than a rigid JOIN. And the size of the code will end up being much smaller.

For starters

Window functions start with the OVER operator and are configured with three other operators: PARTITION BY, ORDER BY, and ROWS. We will tell you more about ORDER BY, PARTITION BY and its auxiliary operators LAG, LEAD, RANK.

All examples will be based on Datacamp’s Olympic medalist dataset. The table is called summer_medals and contains the results of the Olympics from 1896 to 2010:

All examples will be based on Datacamp's Olympic medalist dataset

ROW_NUMBER and ORDER BY

As mentioned above, the OVER operator creates a window function. Let us start with a simple function ROW_NUMBER, which assigns a number to each selected record:

SELECT
athlete,
event,
ROW_NUMBER() OVER() AS row_number
FROM Summer_Medals
ORDER BY row_number ASC;

Let us start with a simple function ROW_NUMBER, which assigns a number to each selected record

Each pair of “sportsman – sport” received a number, and these numbers can be accessed by the name of row_number.

ROW_NUMBER can be combined with ORDER BY to determine in which order the lines will be numbered. Let us select all available sports with DISTINCT and number them in alphabetical order:

SELECT
sport,
ROW_NUMBER() OVER(ORDER BY sport ASC) AS Row_N
FROM (
SELECT DISTINCT sport
FROM Summer_Medals
) AS sports
ORDER BY sport ASC;

ROW_NUMBER can be combined with ORDER BY to determine

PARTITION BY and LAG, LEAD, and RANK

PARTITION BY allows you to group rows by the value of a particular column. This is useful if the data is logically divided into some categories and something needs to be done with this row taking into account other rows of the same group (say, to compare a tennis player with other tennis players, but not with runners or swimmers). This operator works only with window functions like LAG, LEAD, RANK, etc.

LAG

The LAG function takes a string and returns the one that went before it. For example, we want to find all Olympic tennis champions (men and women separately) since 2004, and for each of them find out who was the previous champion.

It takes several steps to solve this problem. First, we need to create a tabular expression that will save the result of the query “tennis champions since 2004” as a temporary named structure for further analysis. And then divide them by sex and select the previous champion using LAG:

-- Tabular expression searches for tennis champions and selects the required columns
WITH Tennis_Gold AS (
SELECT
Athlete,
Gender,
Year,
Country
FROM
Summer_Medals
WHERE
Year >= 2004 AND
Sport = 'Tennis' AND
event = 'Singles' AND
Medal = 'Gold')

-- The window function divides by the floor and takes the champion from the previous line
SELECT
Athlete as Champion,
Gender,
Year,
LAG(Athlete) OVER (PARTITION BY gender)
ORDER BY Year ASC) AS Last_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;

divide them by sex and select the previous champion using LAG

The PARTITION BY function in the table returned first all men, then all women. For the winners of 2008 and 2012, the previous champion is given; since the data are only for 3 Olympiads, the 2004 champions have no predecessors, so in the corresponding fields is null.

LEAD

The LEAD function is similar to LAG but returns the next line instead of the previous one. You can find out who became the next champion after this or that athlete:

-- Tabular expression searches for tennis champions and selects the required columns
WITH Tennis_Gold AS (
SELECT
Athlete,
Gender,
Year,
Country
FROM
Summer_Medals
WHERE
Year >= 2004 AND
Sport = 'Tennis' AND
event = 'Singles' AND
Medal = 'Gold')

-- The window function divides by the floor and takes the champion from the following line
SELECT
Athlete as Champion,
Gender,
Year,
LEAD (Athlete) OVER (PARTITION BY gender)
ORDER BY Year ASC) AS Future_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;

The LEAD function is similar to LAG but returns the next line instead of the previous one

RANK

The RANK operator is similar to ROW_NUMBER, but assigns the same numbers to the lines with the same values, and skips the “extra” numbers. There is also DENSE_RANK, which does not skip numbers. This sounds confusing, so it is easier to show it by example.

Here is the ranking of countries by the number of Olympiads they participated in by different operators:

The RANK operator is similar to ROW_NUMBER

  • Row_number – nothing interesting, the lines are simply numbered in ascending order.
  • Rank_number – rows are ranked in ascending order, but no number 3. Instead, 2 lines divide the number 2, followed immediately by number 4.
  • Dense_rank – the same as rank_number, but number 3 is not missing. The numbers go in a row, but no one is the fifth of the five.

Here is the code:

-- The tabular expression selects the countries and counts the years
WITH countries AS (
SELECT
Country,
COUNT(DISTINCT year) AS
FROM
Summer_Medals
WHERE
Country in ('GBR', 'DEN', 'FRA', 'ITA', 'AUT')
GROUP BY
Country)

-- Different window functions rank countries
SELECT
Country,
This is the case,
ROW_NUMBER()
OVER(ORDER BY DESC) AS Row_Number,
RANK()
OVER(ORDER BY DESC) AS Rank_Number,
DENSE_RANK()
OVER(ORDER BY DESC) AS Dense_Rank
FROM countries
ORDER BY DESC;

That’s how we arranged this dataset into shelves using window functions. That’s where our introduction to window functions ends. We hope it was interesting and not as difficult as it might seem.

Of course, this is not all the possibility of window functions. There are many other useful things for them, such as ROWS, NTILE, and aggregation functions (SUM, MAX, MIN, and others), but we will talk about it another time.

Window functions in SQL Server

 
Tags: , ,

MORE NEWS

 

Preamble​​NoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

Preamble​​MongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

Preamble​​MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Preamble​​Atom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Preamble​​MongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

FlexibilityOne of the most advertised features of MongoDB is its flexibility.  Flexibility, however, is a double-edged sword. More flexibility means more...

Preamble​​SQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Preamble​​Writing an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Preamble​​Oracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

Preamble​​IBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

  What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

Preamble​​If you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Preamble​​Starting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...