What does ROW_NUMBER do?
The ROW_NUMBER
function is a window function, i.e. a function that processes the data sorted in so-called “windows” or “partitions”. You define the sorting and the partitions by using the keywords ORDER BY
and PARTITION BY
after the function call. The ROW_NUMBER
function then numbers all rows in the result set, starting with 1 in ascending order. Each partition starts at 1 again.
How can you work with ROW_NUMBER?
You can use the ROW_NUMBER
function in the column list of your query if you want to see the assigned numbers:
SELECT [CountryID] ,ROW_NUMBER() OVER (ORDER BY [CountryID]) as [RowNumber] ,[CountryName] ,[FormalName] ,[IsoAlpha3Code] FROM [WideWorldImporters].[Application].[Countries]
Now you see a column with the heading RowNumber
and a consecutive number in the content. Great. Well. Let’s see if there’s anything we can do with it. If you look at the result, you can see that the CountryID
of Albania differs from the RowNumber
early in the line. Now you can use the
ROW_NUMBER function to find the records (well, the record) where both values match:
; WITH cte_rn as ( SELECT [CountryID] ,ROW_NUMBER() OVER (ORDER BY [CountryID]) as [RowNumber] ,[CountryName] ,[FormalName] ,[IsoAlpha3Code] FROM [WideWorldImporters].[Application].[Countries] ) SELECT [CountryId] ,[CountryName] ,[FormalName] ,[IsoAlpha3Code] FROM cte_rn WHERE [CountryID] = [RowNumber]
You see that in the end there is only one line left, the one for Afghanistan. For all other lines, the number of lines differs from the ID. If you wanted consecutive IDs, you could now update the table with the CTE and set the CountryID
accordingly (provided that no IDENTITY
column has been defined here, otherwise you have to work with IDENTITY INSERT
).
That’s all very nice, but what can you really do with the ROW_NUMBER
function? Probably the most frequently used ninjas use the ROW_NUMBER
function in combination with CTEs to filter out duplicate rows from a result. One (admittedly quite stupid) query you can see this from is the following one from which we would like to filter the duplicates in the countries:
SELECT c.[CountryId] ,c.[CountryName] ,c.[FormalName] ,s.[StateProvinceID] FROM [Application].[Countries] as c LEFT JOIN [Application].[StateProvinces] as s ON c.[CountryID] = s.[CountryID] ORDER BY c.CountryID
In the result set you can see that the USA appears several times (since several states are defined in the StateProvinces table). Now let’s add a ROW_NUMBER
that numbers the states within a country consecutively:
SELECT c.[CountryId] ,c.[CountryName] ,c.[FormalName] ,s.[StateProvinceID] ,ROW_NUMBER() OVER (PARTITION BY c.[CountryID] ORDER BY s.[StateProvinceID]) as [RowNumber] FROM [Application].[Countries] as c LEFT JOIN [Application].[StateProvinces] as s ON c.[CountryID] = s.[CountryID]
You can see in the result set that in all countries that do not know any federal states, only a NULL
value appears as a federal state (so the LEFT JOIN
had no hit), but in the USA 53 lines were entered (what a coincidence). Note that we have entered a PARTITION BY
(unlike in the previous examples). This means that if the CountryID
changes in the result set, our ROW_NUMBER
function starts counting from 1 again.
If you now want to filter a line from this, you can use this behaviour by keeping only the entries with a ROW_NUMBER
1:
; with cte_rn as ( SELECT c.[CountryId] ,c.[CountryName] ,c.[FormalName] ,s.[StateProvinceID] ,ROW_NUMBER() OVER (PARTITION BY c.[CountryID] ORDER BY s.[StateProvinceID]) as [RowNumber] FROM [Application].[Countries] as c LEFT JOIN [Application].[StateProvinces] as s ON c.[CountryID] = s.[CountryID] ) select [CountryID] ,[CountryName] ,[FormalName] ,[StateProvinceID] ,[RowNumber] FROM cte_rn WHERE [RowNumber] = 1
Your result set now contains only one row for the USA again.
With this pattern you can now filter duplicates from tables in all possible and impossible situations. Note, however, that it would be better to write the queries from the beginning in such a way that no unwanted duplicates occur.
2 thoughts on “ROW_NUMBER”
Jetzt muss ich hier mal schnell ein Lob loswerden.
Manchmal bin ich ja kein so doller T-SQL Wizard und muss hin und wieder auch etwas knobeln.
Dieser Beitrag hat mir relativ schnell geholfen mein heutiges, kleines TSQL Problem rel. schnell zu lösen.
Danke dafür
Hallo Dirk, danke dir. Es freut uns, wenn das weiterhilft.