STRING_AGG

What is STRING_AGG?

Last week you saw how to use STUFF and SELECT FOR XML to create a comma-separated list from the rows of a table. Since SQL Server 2017, this has become even easier with the introduction of the STRING_AGGfunction. The STRING_AGG function concatenates strings and inserts a separator.

What can you do with STRING_AGG?

Remember last week when we created a comma separated list of transaction types with the following statement

SELECT 
    STUFF(
        (SELECT ','+TransactionTypeName
            FROM [Application].[TransactionTypes] 
            FOR XML PATH ('')
        ), 1, 1, '')

This is not necessarily elegant, but works on all versions of SQL Server. The unpleasant thing about it is that we have to nest SELEC statements for this procedure if we want to add other columns.

More elegant is the use of STRING_AGG, because here no nested SELECT is necessary anymore, the query is very simple and direct:

SELECT STRING_AGG(TransactionTypeName, ', ')
FROM [Application].[TransactionTypes]

This query is of course much more elegant and simple than last week’s query. This is of course a big advantage of this feature. But it gets even more exciting if we now want to include another column, the TransactionTypeID in the query. For the STUFF query, the fact that STUFF is an aggregate function such as SUMMIN or MAX will then take effect, which is why we need a GROUP BY. The query is then:

SELECT
     TransactionTypeID
    ,STUFF(
        (SELECT ','+TransactionTypeName
            FROM [Application].[TransactionTypes] 
            FOR XML PATH ('')
        ), 1, 1, '')
FROM [Application].[TransactionTypes]
GROUP BY TransactionTypeID

The same applies to STRING_AGG where the “aggregate” is already in the name. Here we have to write instead:

SELECT
     TransactionTypeID
    ,STRING_AGG(TransactionTypeName, ',')
FROM [Application].[TransactionTypes]
GROUP BY TransactionTypeID

But careful, now also the output of the two queries is different. While the STUFF query executes the sub-query for each row of the result set, i.e. writes the entire comma-separated list on each row, the second query writes only those TransactionTypeNames on the row that belong to the corresponding TransactionTypeID, i.e. only one entry each. If this is not the behaviour we are looking for, we would have to work with a subquery, a CTE or OUTER APPLY here as well:

SELECT
     TransactionTypeID
    ,subqery.CSV
FROM [Application].[TransactionTypes]
OUTER APPLY (
    SELECT 
         STRING_AGG(TransactionTypeName, ',') as CSV 
    FROM [Application].[TransactionTypes]
) subqery

What you should take from the dojo today: If you are using a database that supports this feature, use STRING_AGG, which is the much simpler, more elegant and often even more performant way of linking strings of a column. But be careful when replacing the STUFF formulation 1:1 in old applications, the aggregation might not work for you, so be sure to test your results… But as real ninjas you do that anyway.

References

Ninja-Notebooks @ GitHub

Leave a Reply

Your email address will not be published.