What is STRING_AGG?
Last week you saw how to use
SELECT FOR XML to create a comma-separated list from the rows of a table. Since SQL Server 2017, this has become even easier with the introduction of the
STRING_AGG function concatenates strings and inserts a separator.
What can you do with STRING_AGG?
Remember last week when we created a comma separated list of transaction types with the following statement
SELECT STUFF( (SELECT ','+TransactionTypeName FROM [Application].[TransactionTypes] FOR XML PATH ('') ), 1, 1, '')
This is not necessarily elegant, but works on all versions of SQL Server. The unpleasant thing about it is that we have to nest
SELEC statements for this procedure if we want to add other columns.
More elegant is the use of
STRING_AGG, because here no nested
SELECT is necessary anymore, the query is very simple and direct:
SELECT STRING_AGG(TransactionTypeName, ', ') FROM [Application].[TransactionTypes]
This query is of course much more elegant and simple than last week’s query. This is of course a big advantage of this feature. But it gets even more exciting if we now want to include another column, the
TransactionTypeID in the query. For the
STUFF query, the fact that
STUFF is an aggregate function such as
MAX will then take effect, which is why we need a
GROUP BY. The query is then:
SELECT TransactionTypeID ,STUFF( (SELECT ','+TransactionTypeName FROM [Application].[TransactionTypes] FOR XML PATH ('') ), 1, 1, '') FROM [Application].[TransactionTypes] GROUP BY TransactionTypeID
The same applies to
STRING_AGG where the “aggregate” is already in the name. Here we have to write instead:
SELECT TransactionTypeID ,STRING_AGG(TransactionTypeName, ',') FROM [Application].[TransactionTypes] GROUP BY TransactionTypeID
But careful, now also the output of the two queries is different. While the
STUFF query executes the sub-query for each row of the result set, i.e. writes the entire comma-separated list on each row, the second query writes only those
TransactionTypeNames on the row that belong to the corresponding
TransactionTypeID, i.e. only one entry each. If this is not the behaviour we are looking for, we would have to work with a subquery, a CTE or
OUTER APPLY here as well:
SELECT TransactionTypeID ,subqery.CSV FROM [Application].[TransactionTypes] OUTER APPLY ( SELECT STRING_AGG(TransactionTypeName, ',') as CSV FROM [Application].[TransactionTypes] ) subqery
What you should take from the dojo today: If you are using a database that supports this feature, use
STRING_AGG, which is the much simpler, more elegant and often even more performant way of linking strings of a column. But be careful when replacing the
STUFF formulation 1:1 in old applications, the aggregation might not work for you, so be sure to test your results… But as real ninjas you do that anyway.