STATISTICS IO

What is STATISTICS IO?

“Yoo-hoo, ninja, I have this query, it’s really slow, it wasn’t like that yesterday, for sure”. With queries like this one every ninja will be surprised sooner or later and this proves that the other users have certain ninja skills, too, because you are certainly never prepared for this situation. So what to do? Of course we’ll see what the query does. First search for victims in the Execution Plan, but how much data is read at all? The Katana that decomposes this question is STATISTICS IO.

How can you work with STATISTICS IO?

If you want to see how many disk accesses a query causes, you can get this information by sending a SET STATISTICS IO ON before the query. What you will then see is a set of information that we would like to explore a bit in the following. Let’s use the statistics to query the StockItems table in the Wide World Importers database:

SET STATISTICS IO ON
SELECT 
       [StockItemID]
      ,[StockItemName]
      ,[UnitPrice]
FROM [Warehouse].[StockItems]

In addition to the results, the Messages tab also contains information about the execution:

Table 'StockItems'. Scan count 1, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If we now break down this information, we see that the table has been scanned once, which is not surprising given that we are querying the entire table. We also learn that 16 logical reads have taken place. This means that there were 16 data pages in the query that the server read from the cache. The physical reads are data pages that were read from disk for the query, so they were not in the cache. Now let’s restrict the result set a bit:

SET STATISTICS IO ON
SELECT 
       [StockItemID]
      ,[StockItemName]
      ,[UnitPrice]
FROM [Warehouse].[StockItems]
WHERE UnitPrice > 50.0

Here we also see the information that 16 database pages have been read:

Table 'StockItems'. Scan count 1, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Why is that, now that we only get 23 lines back? This is because UnitPrice is not an index column, so we still have to read every database page to find out whether a row is included in the result set or not. It looks different if we restrict to an index column:

SET STATISTICS IO ON
SELECT 
       [StockItemID]
      ,[StockItemName]
      ,[UnitPrice]
FROM [Warehouse].[StockItems]
WHERE StockItemID = 150

This time, to find the entry we are interested in, not all data pages of the table had to be read, but only 2:

Table 'StockItems'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Accordingly, there is no scan here and we see a scan count of 0. The two pages that had to be read are the two index levels of the primary key. But when do we see more than 1 scan? Therefore we look at the following query:

SET STATISTICS IO ON
SELECT 
       [StockItemID]
      ,[StockItemName]
      ,[UnitPrice]
FROM [Warehouse].[StockItems]
WHERE StockItemID = 150 or StockItemID = 120

Here, the correct line must be searched for twice using the primary key. And while the execution plan here shows an Index Seek, in Statistics IO we see two scan processes.

This apparent contradiction is one that is based on pure naming only, since according to the documentation, the scan count is defined as follows: “The number of searches or scans started in any direction after reaching the sheet level to retrieve all values to create the last dataset for output.”

Finally we come back to the search for the price, let’s create an index here now:

BEGIN TRY DROP INDEX idx_StockItemUnitPrice ON [Warehouse].[StockItems] END TRY 
BEGIN CATCH END CATCH 
CREATE NONCLUSTERED INDEX idx_StockItemUnitPrice 
ON [Warehouse].[StockItems] (UnitPrice) INCLUDE ([StockItemID], [StockItemName])

Since this index contains all columns needed to answer the query, only the index needs to be accessed here. Of course we see this in our IO statistics:

Table 'StockItems'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

For this simple query with our index, we have reduced the number of reads from 16 to 2 database pages, that is, by 87{cd57dc5c024b062e80cf443c8eab5674adc0a8508c08485cd20648099fe1d76c}. However, optimizing the number of reads for different queries is a science in itself, which would go beyond a simple ninja contribution. Here the ninja bows in fear towards the great ninja master Uwe Ricken-San, in whose Blog you can find more detailed information on this topic.

References

Ninja-Notebooks @ GitHub

Leave a Reply

Your email address will not be published.