What is PERCENTILE_DISC?
Like the CUME_DIST
function introduced last week, PERCENTILE_DISC
is a statistical function. It was introduced in SQL Server 2012 and calculates quantiles. If you want to use the 0.5-quantile of a set (or ‘sample’ if we want to stick to the usual terminology used in statistics) you understand the value where half of all values are smaller and the other half are larger than this value. Accordingly, the 0.1-quantile is the value where 10% of the values are smaller and 90% of the values are larger than it. You can calculate quantiles with PERCENTILE_DISC
. As with CUME_DIST
, a window with sorting is specified and the value of the ORDER BY
columns is returned, where the distribution function, i.e. CUME_DIST
in the same window is greater than the passed value for the first time.
How can you work with PERCENTILE_DISC?
Let’s make a query in the WideWorldImporters database, looking at the distribution of orders in the year:
SELECT InvoiceDate ,CUME_DIST() OVER (PARTITION BY DATEPART(YEAR, InvoiceDate) ORDER BY InvoiceDate) as SalesDistribution FROM Sales.Invoices
Let’s assume that we now want to know on which day of the year 90% of the orders were received. Then we calculate the 0.9 quantile of orders as follows:
SELECT InvoiceDate ,CUME_DIST() OVER (PARTITION BY DATEPART(YEAR, InvoiceDate) ORDER BY InvoiceDate) as SalesDistribution ,PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY InvoiceDate) OVER (PARTITION BY DATEPART(YEAR, InvoiceDate)) as BreakevenPoint FROM Sales.Invoices ORDER BY InvoiceDate
It is important to note that PERCENTILE_DISC
does not have the ORDER BY
condition in the normal window, i.e. in the OVER
clause, but in a special WITHIN GROUP
clause. This is because, unlike the OVER
clause, we cannot specify multiple sort columns here, but are restricted to a single column for sorting, which is also the return value of the function.
Now you can see on the line of each order when in the year of the order 90% of the orders were received. In this granularity this might be helpful in an application, but of course we can also do this calculation in a CTE, whose results we then summarize:
WITH CTE_Percentile AS ( SELECT InvoiceDate ,CUME_DIST() OVER (PARTITION BY DATEPART(YEAR, InvoiceDate) ORDER BY InvoiceDate) as SalesDistribution ,PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY InvoiceDate) OVER (PARTITION BY DATEPART(YEAR, InvoiceDate)) as BreakevenPoint FROM Sales.Invoices ) SELECT DISTINCT DATEPART(YEAR, InvoiceDate) as InvoiceYear ,BreakevenPoint FROM CTE_Percentile
It is important to note that PERCENTILE_DISC
will always return a concrete value that exists in your table, i.e. it will not interpolate or anything, but will return the value where CUME_DIST
is above the searched quantile for the first time. Therefore the name of the function: PERCENTILE_DISC
calculates on the discrete set of rows exactly the value that fulfills the condition.