FIRST_VALUE und LAST_VALUE

What are FIRST_VALUE and LAST_VALUE?

Like the ROW_NUMBER and RANK functions introduced in weeks 12 to 14, FIRST_VALUE and LAST_VALUE are also part of the window functions. This means they are functions that operate on an ordered set of rows in the result set. According to their names, the functions return the first or last row of the window on which they operate.

How can you work with FIRST_VALUE and LAST_VALUE?

To find the first and last rows of a result window, we take the Sales.Order table in the WideWorldImportersdatabase to find the first order of each customer. We want to look at the following columns of the table:

SELECT 
     OrderDate
    ,OrderID
    ,ExpectedDeliveryDate
    ,CustomerID
FROM [Sales].[Orders]

To find now for each order the first order of the respective customer, we use the FIRST_VALUE function. This function now calculates the first values for each row in the orders table:

SELECT 
     OrderDate as DateCurrentOrder
    ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder
    ,OrderID as IDCurrentOrder
    ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder
    ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder
    ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder
    ,CustomerID
FROM [Sales].[Orders]
ORDER BY CustomerID

Now we can use these values, for example, to calculate for each order how many days after the first order it was placed:

SELECT 
     OrderDate as DateCurrentOrder
    ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder
    ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder
    ,OrderID as IDCurrentOrder
    ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder
    ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder
    ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder
    ,CustomerID
FROM [Sales].[Orders]
ORDER BY CustomerID

In the same way, of course, we can now determine the days from the order to the customer’s last order by using LAST_VALUE:

SELECT 
     OrderDate as DateCurrentOrder
    ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder
    ,LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) DateLastOrder
    ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder
    ,OrderID as IDCurrentOrder
    ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder
    ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder
    ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder
    ,CustomerID
FROM [Sales].[Orders]
ORDER BY CustomerID

But to our great surprise, we now see as LAST_VALUE always the value of the current line as the last order date of the customer. This is because we already explained in the first articles about window functions that the default window always extends from the beginning of the definition area to the current line. To change this we need to specify a so called “range clause” which has the form ROWS BETWEEN a AND b:

SELECT 
     OrderDate as DateCurrentOrder
    ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder
    ,LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) DateLastOrder
    ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder
    ,OrderID as IDCurrentOrder
    ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder
    ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder
    ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder
    ,CustomerID
FROM [Sales].[Orders]
ORDER BY CustomerID

With this background knowledge, we can now also indicate the distance to the customer’s most recent order:

SELECT 
     OrderDate as DateCurrentOrder
    ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as DateFirstOrder
    ,LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) DateLastOrder
    ,DATEDIFF(DAY, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), OrderDate) DaysSinceFirstOrder
    ,DATEDIFF(DAY, OrderDate, LAST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)) DaysToLastOrder
    ,OrderID as IDCurrentOrder
    ,FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as IDirstOrder
    ,ExpectedDeliveryDate as ExpectedDeliveryCurrentOrder
    ,FIRST_VALUE(ExpectedDeliveryDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) as ExpectedDeliveryFirstOrder
    ,CustomerID
FROM [Sales].[Orders]
ORDER BY CustomerID

FIRST_VALUE and LAST_VALUE are really useful functions, nevertheless you should be careful with the techniques shown here, because window functions are potential performance killers, because in case of doubt a multiple of the returned result set has to be read and processed by the server to execute the windows completely.

References

Ninja-Notebooks @ GitHub

Leave a Reply

Your email address will not be published.