ROW_NUMBER

What does ROW_NUMBER do?

The ROW_NUMBER function is a window function, i.e. a function that processes the data sorted in so-called “windows” or “partitions”. You define the sorting and the partitions by using the keywords ORDER BY and PARTITION BY after the function call. The ROW_NUMBER function then numbers all rows in the result set, starting with 1 in ascending order. Each partition starts at 1 again.

How can you work with ROW_NUMBER?

You can use the ROW_NUMBER function in the column list of your query if you want to see the assigned numbers:

SELECT
     [CountryID]
    ,ROW_NUMBER() OVER (ORDER BY [CountryID]) as [RowNumber]
    ,[CountryName]
    ,[FormalName]
    ,[IsoAlpha3Code]
FROM [WideWorldImporters].[Application].[Countries]

Now you see a column with the heading RowNumber and a consecutive number in the content. Great. Well. Let’s see if there’s anything we can do with it. If you look at the result, you can see that the CountryID of Albania differs from the RowNumber early in the line. Now you can use the ROW_NUMBER function to find the records (well, the record) where both values match:

; WITH cte_rn as (
    SELECT
         [CountryID]
        ,ROW_NUMBER() OVER (ORDER BY [CountryID]) as [RowNumber]
        ,[CountryName]
        ,[FormalName]
        ,[IsoAlpha3Code]
    FROM [WideWorldImporters].[Application].[Countries]
)
SELECT 
     [CountryId]
    ,[CountryName]
    ,[FormalName]
    ,[IsoAlpha3Code]
FROM cte_rn 
WHERE [CountryID] = [RowNumber]

You see that in the end there is only one line left, the one for Afghanistan. For all other lines, the number of lines differs from the ID. If you wanted consecutive IDs, you could now update the table with the CTE and set the CountryID accordingly (provided that no IDENTITY column has been defined here, otherwise you have to work with IDENTITY INSERT).

That’s all very nice, but what can you really do with the ROW_NUMBER function? Probably the most frequently used ninjas use the ROW_NUMBER function in combination with CTEs to filter out duplicate rows from a result. One (admittedly quite stupid) query you can see this from is the following one from which we would like to filter the duplicates in the countries:

SELECT 
     c.[CountryId]
    ,c.[CountryName]
    ,c.[FormalName]
    ,s.[StateProvinceID]
FROM [Application].[Countries] as c
LEFT JOIN [Application].[StateProvinces] as s 
ON c.[CountryID] = s.[CountryID]
ORDER BY c.CountryID

In the result set you can see that the USA appears several times (since several states are defined in the StateProvinces table). Now let’s add a ROW_NUMBER that numbers the states within a country consecutively:

SELECT 
     c.[CountryId]
    ,c.[CountryName]
    ,c.[FormalName]
    ,s.[StateProvinceID]
    ,ROW_NUMBER() OVER (PARTITION BY c.[CountryID] ORDER BY s.[StateProvinceID]) as [RowNumber]
FROM [Application].[Countries] as c
LEFT JOIN [Application].[StateProvinces] as s 
ON c.[CountryID] = s.[CountryID]

You can see in the result set that in all countries that do not know any federal states, only a NULL value appears as a federal state (so the LEFT JOIN had no hit), but in the USA 53 lines were entered (what a coincidence). Note that we have entered a PARTITION BY (unlike in the previous examples). This means that if the CountryID changes in the result set, our ROW_NUMBER function starts counting from 1 again.

If you now want to filter a line from this, you can use this behaviour by keeping only the entries with a ROW_NUMBER 1:

; with cte_rn as ( 
    SELECT 
         c.[CountryId]
        ,c.[CountryName]
        ,c.[FormalName]
        ,s.[StateProvinceID]
        ,ROW_NUMBER() OVER (PARTITION BY c.[CountryID] ORDER BY s.[StateProvinceID]) as [RowNumber]
    FROM [Application].[Countries] as c
    LEFT JOIN [Application].[StateProvinces] as s 
    ON c.[CountryID] = s.[CountryID]
)
select 
     [CountryID]
    ,[CountryName]
    ,[FormalName]
    ,[StateProvinceID]
    ,[RowNumber]
FROM cte_rn
WHERE [RowNumber] = 1

Your result set now contains only one row for the USA again.

With this pattern you can now filter duplicates from tables in all possible and impossible situations. Note, however, that it would be better to write the queries from the beginning in such a way that no unwanted duplicates occur.

References

Ninja-Notebooks @ GitHub

2 thoughts on “ROW_NUMBER

  1. Jetzt muss ich hier mal schnell ein Lob loswerden.
    Manchmal bin ich ja kein so doller T-SQL Wizard und muss hin und wieder auch etwas knobeln.
    Dieser Beitrag hat mir relativ schnell geholfen mein heutiges, kleines TSQL Problem rel. schnell zu lösen.
    Danke dafür

Leave a Reply

Your email address will not be published. Required fields are marked *