What is STRING_AGG?
Last week you saw how to use STUFF
and SELECT FOR XML
to create a comma-separated list from the rows of a table. Since SQL Server 2017, this has become even easier with the introduction of the STRING_AGG
function. The STRING_AGG
function concatenates strings and inserts a separator.
What can you do with STRING_AGG?
Remember last week when we created a comma separated list of transaction types with the following statement
SELECT STUFF( (SELECT ','+TransactionTypeName FROM [Application].[TransactionTypes] FOR XML PATH ('') ), 1, 1, '')
This is not necessarily elegant, but works on all versions of SQL Server. The unpleasant thing about it is that we have to nest SELEC
statements for this procedure if we want to add other columns.
More elegant is the use of STRING_AGG
, because here no nested SELECT
is necessary anymore, the query is very simple and direct:
SELECT STRING_AGG(TransactionTypeName, ', ') FROM [Application].[TransactionTypes]
This query is of course much more elegant and simple than last week’s query. This is of course a big advantage of this feature. But it gets even more exciting if we now want to include another column, the TransactionTypeID
in the query. For the STUFF
query, the fact that STUFF
is an aggregate function such as SUM
, MIN
or MAX
will then take effect, which is why we need a GROUP BY
. The query is then:
SELECT TransactionTypeID ,STUFF( (SELECT ','+TransactionTypeName FROM [Application].[TransactionTypes] FOR XML PATH ('') ), 1, 1, '') FROM [Application].[TransactionTypes] GROUP BY TransactionTypeID
The same applies to STRING_AGG
where the “aggregate” is already in the name. Here we have to write instead:
SELECT TransactionTypeID ,STRING_AGG(TransactionTypeName, ',') FROM [Application].[TransactionTypes] GROUP BY TransactionTypeID
But careful, now also the output of the two queries is different. While the STUFF
query executes the sub-query for each row of the result set, i.e. writes the entire comma-separated list on each row, the second query writes only those TransactionTypeNames
on the row that belong to the corresponding TransactionTypeID
, i.e. only one entry each. If this is not the behaviour we are looking for, we would have to work with a subquery, a CTE or OUTER APPLY
here as well:
SELECT TransactionTypeID ,subqery.CSV FROM [Application].[TransactionTypes] OUTER APPLY ( SELECT STRING_AGG(TransactionTypeName, ',') as CSV FROM [Application].[TransactionTypes] ) subqery
What you should take from the dojo today: If you are using a database that supports this feature, use STRING_AGG
, which is the much simpler, more elegant and often even more performant way of linking strings of a column. But be careful when replacing the STUFF
formulation 1:1 in old applications, the aggregation might not work for you, so be sure to test your results… But as real ninjas you do that anyway.