What are FIRST_VALUE and LAST_VALUE?
Like the ROW_NUMBER
and RANK
functions introduced in weeks 12 to 14, FIRST_VALUE
and LAST_VALUE
are also part of the window functions. This means they are functions that operate on an ordered set of rows in the result set. According to their names, the functions return the first or last row of the window on which they operate.
How can you work with FIRST_VALUE and LAST_VALUE?
To find the first and last rows of a result window, we take the Sales.Order
table in the WideWorldImporters
database to find the first order of each customer. We want to look at the following columns of the table:
SELECT OrderDate ,OrderID ,ExpectedDeliveryDate ,CustomerID FROM [Sales].[Orders]
To find now for each order the first order of the respective customer, we use the FIRST_VALUE
function. This function now calculates the first values for each row in the orders table:
SELECT OrderDate as DateCurrentOrder ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder ,OrderID as IDCurrentOrder ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder ,CustomerID FROM [Sales].[Orders] ORDER BY CustomerID
Now we can use these values, for example, to calculate for each order how many days after the first order it was placed:
SELECT OrderDate as DateCurrentOrder ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder ,OrderID as IDCurrentOrder ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder ,CustomerID FROM [Sales].[Orders] ORDER BY CustomerID
In the same way, of course, we can now determine the days from the order to the customer’s last order by using LAST_VALUE
:
SELECT OrderDate as DateCurrentOrder ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder ,LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) DateLastOrder ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder ,OrderID as IDCurrentOrder ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder ,CustomerID FROM [Sales].[Orders] ORDER BY CustomerID
But to our great surprise, we now see as LAST_VALUE
always the value of the current line as the last order date of the customer. This is because we already explained in the first articles about window functions that the default window always extends from the beginning of the definition area to the current line. To change this we need to specify a so called “range clause” which has the form ROWS BETWEEN a AND b
:
SELECT OrderDate as DateCurrentOrder ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder ,LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) DateLastOrder ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder ,OrderID as IDCurrentOrder ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder ,CustomerID FROM [Sales].[Orders] ORDER BY CustomerID
With this background knowledge, we can now also indicate the distance to the customer’s most recent order:
SELECT OrderDate as DateCurrentOrder ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder ,LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) DateLastOrder ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder ,DATEDIFF(DAY, OrderDate, LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)) DaysToLastOrder ,OrderID as IDCurrentOrder ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder ,CustomerID FROM [Sales].[Orders] ORDER BY CustomerID
FIRST_VALUE
and LAST_VALUE
are really useful functions, nevertheless you should be careful with the techniques shown here, because window functions are potential performance killers, because in case of doubt a multiple of the returned result set has to be read and processed by the server to execute the windows completely.