T-SQL Ninja #12

RANK

What does RANK do?

Like the ROW_NUMBER function introduced last week, RANK is also a window function, i.e. a function that processes the data sorted in so-called “windows” or “partitions”. You define the sorting and the partitions by using the keywords ORDER BY and PARTITION BY after the function call. The RANK function then numbers all the rows in the result set, starting at 1 in ascending order. Unlike the ROW_NUMBER function, the RANK function numbers rows that are in the same position in the sort with the same number. With ROW_NUMBER, a random sort is chosen here and numbered uniquely sequentially.

How can you work with RANK?

To use the RANK function, we now create a query in the Wide World Importers database that returns sortable items, but where the sort need not be unique. For this we use a CTE to count the number of orders from the OrderLines in which each product appeared:

WITH cte_sales AS (
    SELECT 
         StockItemID
        ,COUNT(*) AS nOrders
        ,SUM(Quantity*UnitPrice) AS TotalSales 
    FROM 
        Sales.OrderLines
    GROUP BY StockItemID
)
SELECT  
     StockItemID
    ,nOrders
    ,TotalSales
FROM cte_sales 
ORDER BY 
     nOrders DESC
    ,TotalSales DESC

But now we don’t just want to count and display, we want to create a ranking of the most popular products. If we would use the ROW_NUMBER function for this, the two most popular products with the IDs 104 and 120, which have been ordered 1123 times each, would have the numbers 1 and 2, which would have the 1 and which the 2, we could determine, for example, by sorting by TotalSales:

WITH cte_sales AS (
    SELECT 
         StockItemID
        ,COUNT(*) AS nOrders
        ,SUM(Quantity*UnitPrice) AS TotalSales 
    FROM 
        Sales.OrderLines
    GROUP BY StockItemID
)
SELECT  
     StockItemID
    ,nOrders
    ,TotalSales
    ,ROW_NUMBER() OVER (ORDER BY nOrders DESC, TotalSales DESC) as Popularity
FROM cte_sales 
ORDER BY 
     nOrders DESC
    ,TotalSales DESC

From a commercial point of view, this additional sorting according to sales totals may be useful, but it does not reflect the actual popularity of the product. If we go purely by sales figures, both products would have to be number one, and this is exactly what the RANKfunction makes possible. Here we can then also do without the tiebreaker, which distinguished between the first and second line, and accordingly only need an ORDER BY argument:

WITH cte_sales AS (
    SELECT 
         StockItemID
        ,COUNT(*) AS nOrders
        ,SUM(Quantity*UnitPrice) AS TotalSales 
    FROM 
        Sales.OrderLines
    GROUP BY StockItemID
)
SELECT  
     StockItemID
    ,nOrders
    ,TotalSales
    ,ROW_NUMBER() OVER (ORDER BY nOrders DESC, TotalSales DESC) as Popularity
    ,RANK() OVER (ORDER BY nOrders DESC) as Real_Popularity
FROM cte_sales 
ORDER BY 
     nOrders DESC
    ,TotalSales DESC

As a result, you can see that now both products, 104 and 120 have received the (real) popularity 1, while the next product has been assigned the third rank, similar to what is usual in sports competitions. So if you can’t tell from the data which line is the “better” one and want to avoid random numbering, use the RANK function instead of the ROW_NUMBER function. However, it is of course not suitable for sorting out duplicates, since the duplicates would all be assigned the same rank…

References

Ninja-Notebooks @ GitHub

Leave a Reply

Your email address will not be published. Required fields are marked *