## What is PERCENTILE_DISC?

Like the `CUME_DIST`

function introduced last week, `PERCENTILE_DISC`

is a statistical function. It was introduced in SQL Server 2012 and calculates quantiles. If you want to use the 0.5-quantile of a set (or ‘sample’ if we want to stick to the usual terminology used in statistics) you understand the value where half of all values are smaller and the other half are larger than this value. Accordingly, the 0.1-quantile is the value where 10% of the values are smaller and 90% of the values are larger than it. You can calculate quantiles with `PERCENTILE_DISC`

. As with `CUME_DIST`

, a window with sorting is specified and the value of the `ORDER BY`

columns is returned, where the distribution function, i.e. `CUME_DIST`

in the same window is greater than the passed value for the first time.

## How can you work with PERCENTILE_DISC?

Let’s make a query in the WideWorldImporters database, looking at the distribution of orders in the year:

SELECT InvoiceDate ,CUME_DIST() OVER (PARTITION BY DATEPART(YEAR, InvoiceDate) ORDER BY InvoiceDate) as SalesDistribution FROM Sales.Invoices

Let’s assume that we now want to know on which day of the year 90% of the orders were received. Then we calculate the 0.9 quantile of orders as follows:

SELECT InvoiceDate ,CUME_DIST() OVER (PARTITION BY DATEPART(YEAR, InvoiceDate) ORDER BY InvoiceDate) as SalesDistribution ,PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY InvoiceDate) OVER (PARTITION BY DATEPART(YEAR, InvoiceDate)) as BreakevenPoint FROM Sales.Invoices ORDER BY InvoiceDate

It is important to note that `PERCENTILE_DISC`

does not have the `ORDER BY`

condition in the normal window, i.e. in the `OVER`

clause, but in a special `WITHIN GROUP`

clause. This is because, unlike the `OVER`

clause, we cannot specify multiple sort columns here, but are restricted to a single column for sorting, which is also the return value of the function.

Now you can see on the line of each order when in the year of the order 90% of the orders were received. In this granularity this might be helpful in an application, but of course we can also do this calculation in a CTE, whose results we then summarize:

WITH CTE_Percentile AS ( SELECT InvoiceDate ,CUME_DIST() OVER (PARTITION BY DATEPART(YEAR, InvoiceDate) ORDER BY InvoiceDate) as SalesDistribution ,PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY InvoiceDate) OVER (PARTITION BY DATEPART(YEAR, InvoiceDate)) as BreakevenPoint FROM Sales.Invoices ) SELECT DISTINCT DATEPART(YEAR, InvoiceDate) as InvoiceYear ,BreakevenPoint FROM CTE_Percentile

It is important to note that `PERCENTILE_DISC`

will always return a concrete value that exists in your table, i.e. it will not interpolate or anything, but will return the value where `CUME_DIST`

is above the searched quantile for the first time. Therefore the name of the function: `PERCENTILE_DISC`

calculates on the discrete set of rows exactly the value that fulfills the condition.