CTEs

What are CTEs?

CTE is the abbreviation for “Common Table Expression”. You can think of it as a named intermediate result that you can access over and over again in your query. The important thing is that a CTE defines a table and that all columns in that table must be named. So the CTE contains a SELECT statement where all derived columns must have an alias.

Where do I use CTEs?

There are always scenarios where the use of CTEs is appropriate. The CTE is not always the best and fastest way to write a query. But even if it does not improve the performance of a query, it usually improves the readability. CTEs are very useful in places where you would otherwise use nested queries, that is, in places where the granularity differs from different elements of queries. Let’s look at an example. Let’s query the [Sales].[Orders]. table of the WWI database. First of all, we want to find out which customers have ordered several times from the same sales employee in one day. This is a simple ‘GROUP BY’ query:

SELECT [CustomerID]
      ,[SalespersonPersonID]
      ,[OrderDate]
      ,count(distinct OrderID) as OrdersAtDateBySalesPerson
  FROM [WideWorldImporters].[Sales].[Orders] 
group by CustomerId, SalespersonPersonID, OrderDate
order by 4 desc

In the next step, we write the total number of orders of the respective customer on the line. The naïve approach here would be to use a window function and count the orders in the grouping in the same way as for counting:

SELECT [CustomerID]
      ,[SalespersonPersonID]
      ,[OrderDate]
      ,COUNT(distinct OrderID) as OrdersAtDateBySalesPerson
      ,COUNT(distinct OrderID) OVER (partition by CustomerID) as TotalOrders
  FROM [WideWorldImporters].[Sales].[Orders] 
group by CustomerId, SalespersonPersonID, OrderDate
order by 4 desc

However, this approach fails because the use of Distinct in windowed functions is not allowed: Use of DISTINCT is not allowed with the OVER clause. If it is given that there is only one row in the table per order, the result would be the same, so let’s try the following:

SELECT [CustomerID]
      ,[SalespersonPersonID]
      ,[OrderDate]
      ,COUNT(OrderID) as OrdersAtDateBySalesPerson
      ,COUNT(OrderID) OVER (partition by CustomerID) as TotalOrders
  FROM [WideWorldImporters].[Sales].[Orders] 
group by CustomerId, SalespersonPersonID, OrderDate
order by 4 desc

Here we get the message that the use of the OrderID column is not allowed because it is not contained in the Group ByColumn 'WideWorldImporters.Sales.Orders.OrderID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. We can work around this problem too: we can simply count the lines by using a COUNT(1):

SELECT [CustomerID]
      ,[SalespersonPersonID]
      ,[OrderDate]
      ,COUNT(1) as OrdersAtDateBySalesPerson
      ,COUNT(1) OVER (partition by CustomerID) as TotalOrders
  FROM [WideWorldImporters].[Sales].[Orders] 
group by CustomerId, SalespersonPersonID, OrderDate
order by 4 desc

Now, however, we no longer want to know the total number of orders, but a customer rank, which is calculated from this number:

SELECT [CustomerID]
      ,[SalespersonPersonID]
      ,[OrderDate]
      ,COUNT(1) as OrdersAtDateBySalesPerson
      ,DENSE_RANK() OVER (ORDER BY COUNT(1) OVER (partition by CustomerID)) as CustRank
  FROM [WideWorldImporters].[Sales].[Orders] 
group by CustomerId, SalespersonPersonID, OrderDate
order by 4 desc

And here at the latest we are at the end with the window functions, because they must not contain each other: Windowed functions cannot be used in the context of another windowed function or aggregate. The solution to this problem is provided by CTEs: Similar to the way we count the total number of orders for the customer in a view and then could join them with the query, we can do this with a CTE. For this we first write the CTE. You can always recognize this by the fact that it starts with the keyword ‘WITH’. We then join this CTE with our query, which calculates the rank of the lines:

WITH cte_rank as (
    SELECT [CustomerID]
        ,COUNT(1) as TotalOrders
    FROM [WideWorldImporters].[Sales].[Orders] 
    GROUP BY CustomerID
)
SELECT o.[CustomerID]
    ,[SalespersonPersonID]
    ,[OrderDate]
    ,COUNT(1) OVER (PARTITION BY o.CustomerID, SalespersonPersonID, OrderDate)  as OrdersAtDateBySalesPerson
    ,DENSE_RANK() OVER (ORDER BY r.TotalOrders)
FROM [WideWorldImporters].[Sales].[Orders] o 
LEFT JOIN cte_rank r ON o.CustomerID = r.CustomerID
ORDER BY 4 DESC

So here the CTE replaces a nested SELECT statement or view that we could otherwise use for these purposes as well. Similarly, CTEs can also be used to access the results of ROW_NUMBER or RANK lines, because if we only want to see the first order of a customer on the day, we can do this as follows: we number the orders on that day and select the one with line number 1.

WITH cte_rownumbers AS (
    select [CustomerID]
        ,[OrderDate]
        ,ROW_NUMBER() OVER (PARTITION BY [CustomerID], [OrderDate] ORDER BY [OrderDate]) as RowNumber
    FROM [WideWorldImporters].[Sales].[Orders]
)
SELECT * FROM cte_rownumbers
WHERE RowNumber = 1

This way you can also delete duplicates from your tables.

Referenzen:

Ninja-Notebooks @ GitHub

2 thoughts on “CTEs

  1. Ha, wenn in Beitrag 1 schon Window Functions als “naiver Ansatz” bezeichnet werden, bin ich echt gespannt auf die nächsten 52 Posts! 😉 Weiter so, Jungs!

Leave a Reply

Your email address will not be published. Required fields are marked *