T-SQL Ninja #44

MAXDOP

What’s MAXDOP?

You’ve probably noticed when viewing execution plans in SQL Server that there are often two small arrows on a yellow background (in SQL Server Management Studio) and the operation “Parallelism (Gather Streams)”. What this means, simply put, is that at some points SQL Server decides to execute certain operations in parallel in multiple threads. You can control how many processors SQL Server can use for this by using the MAXDOP query hint. MAXDOP stands for “Max Degree Of Parallelism”.

How can you set the MAXDOP?

The MAXDOP is a query hint, and as with all query hints, be careful here, as you are overriding the behavior that either the relational database engine developers or your database administrator have provided. And if you do so, you should at least have a good reason for doing so.

So much for the warning, but now for setting the MAXDOP. In general, there are fixed values for MAXDOP for the server or database during setup, these can be quite different, but the value for the database will then overwrite the value for the server. Both values can be overwritten by the query. To do this, add an OPTION (MAXDOP x) with an integer value x at the end of your query. Special features include the value 0, which explicitly tells the query to use as many cores as available, and the value 1, which tells the query to use only one core, thus explicitly disabling parallelization.

Let us now take a query that causes parallelization. To do this, we query the OrderLines table in the WideWorldImporters database in the Sales schema and get the query plan for the query:

SET SHOWPLAN_XML ON;
GO

SELECT
     [OrderLineID]
    ,[OrderID]
    ,[StockItemID]
    ,[Description]
    ,[PackageTypeID]
    ,[Quantity]
    ,[UnitPrice]
    ,[TaxRate]
    ,[PickedQuantity]
    ,[PickingCompletedWhen]
    ,[LastEditedBy]
    ,[LastEditedWhen]
FROM [Sales].[OrderLines]

Here you see only a scan of the clustered index, which is quite correct, since the whole table is queried here. Let’s now adjust the query so that we see parallelized operations. Therefore we sort the query by a column that is not leading in the clustered index:

SELECT
     [OrderLineID]
    ,[OrderID]
    ,[StockItemID]
    ,[Description]
    ,[PackageTypeID]
    ,[Quantity]
    ,[UnitPrice]
    ,[TaxRate]
    ,[PickedQuantity]
    ,[PickingCompletedWhen]
    ,[LastEditedBy]
    ,[LastEditedWhen]
FROM [Sales].[OrderLines]
ORDER BY [StockItemID]

Now you can see the scan in parallel execution in the execution plan (if your database is not configured to not allow parallelism), which you can recognize by the yellow circle with the little arrows at the operation (in SSMS) or by the green arrow (in Azure Data Studio). In addition, two more operations have been added to the execution plan: a sort operation (logical, since we want to sort the result set) and a “Parallelism (Gather Streams)” operation where the results of the various operations executed in parallel are merged back into the result set. You will also see an attribute “EstimatedAvailableDegreeOfParallelism” in the execution plan XML, which tells you what the Query Optimizer estimates how many parallel processes can be used.

Now let’s try to change the behavior and prevent parallel execution by setting the MAXDOP to 1:

SELECT
     [OrderLineID]
    ,[OrderID]
    ,[StockItemID]
    ,[Description]
    ,[PackageTypeID]
    ,[Quantity]
    ,[UnitPrice]
    ,[TaxRate]
    ,[PickedQuantity]
    ,[PickingCompletedWhen]
    ,[LastEditedBy]
    ,[LastEditedWhen]
FROM [Sales].[OrderLines]
ORDER BY [StockItemID]
OPTION (MAXDOP 1)

In the execution plan, you will no longer see arrows on the scan of the clustered index and no more parallelism operation. The query is no longer executed in parallel. You can also see in the source code of the plan that the EstimatedAvailableDegreeOfParallelism is now set to 1. So you can influence the behaviour of your server with the MAXDOP query hint and control whether certain queries are processed in parallel on multiple processors. Overall, of course, this increases CPU costs, as parallelization causes overhead, but overall, execution time often decreases as these higher costs are processed in multiple threads simultaneously. As mentioned at the beginning, you should be careful when using such hints, because usually both the SQL Server developers and the system administrators have thought of something when setting the parameters, but you have the possibility to make especially resource-hungry queries that require many CPU operations and run at times when other processes are also active on the server a bit more “friendly” by preventing them from blocking all processor cores that are available. Or make them run as fast as possible by grabbing all resources that are available. The choice is yours.

References

Ninja-Notebooks @ GitHub

Leave a Reply

Your email address will not be published. Required fields are marked *