Aaron Bertrand wrote a great article over at MSSQLTips.com titled, “Making a more reliable and flexible sp_MSforeachdb”. In that article he eschews the use of relying on the undocumented (and unsupported) sp_MSforeachdb system stored procedure that can be found in SQL Server’s master system database.
As Aaron points out in his article, most people who do a lot of work with SQL Server have probably used the sp_MSforeachdb procedure once or twice in the past. The concept of this procedure is simple: you pass in a string containing a SQL statement and it loops over each and every database on the server and executes that SQL against it. Aaron points out a few reasons why relying on this stored procedure is bad including the fact that it sometimes doesn’t actually hit all of the databases. He goes on to provide the source code for an alternative stored procedure that can also be used to execute SQL against multiple databases on a server but in a much more reliable and consistent way. He named this alternative stored procedure ‘sp_foreachdb’.
One of the most useful features of Aaron’s sp_foreachdb is the @database_list parameter which lets you pass in a comma-delimited list of database names that you want the command to be run against. I find this feature particularly useful for the multi-tenanted environment in which I work every day. In our setup each tenant has its own separate SQL Server database with an identical schema. In some cases these tenant databases might share a SQL Server instance with other non-tenant databases. The sp_foreachdb procedure makes it easy to write queries that can pull a particular set of data out of subsets of tenant databases and aggregate them into one combined dataset that we can use to identify patterns, track feature usage, or simply give us a 10,000 foot view of the data that spans across all of our tenants.
I started out using the sp_foreachdb procedure for searching on a particular value within a particular column across all tenant databases. I was surprised when the procedure wasn’t returning any results even though I was nearly certain that at least some of the tenant databases contained the value I was searching on. Doing some direct querying against a few of the databases for the value proved that I was right; so why wasn’t the same query being run via sp_foreachdb finding results in these databases?
After reading the source code for a bit and planting a few ‘PRINT’ statements I discovered that a good portion of the databases I was passing in via the @database_list parameter were being truncated before the stored procedure began running the provided query against them. A bit more digging turned up some code that parses the @database_list parameter and massages the values a bit. If you pass in a value of “databaseA,databaseB,databaseC”, the stored procedure will parse that into “N’databaseA’,N’databaseB’,N’databaseC’”. This modified database list value is then used for an IN clause when selecting databases out of the sys.databases view. The resulting list of databases and then later used in a cursor for executing the provided SQL. The issue I was seeing was happening somewhere in the block of code that transforms the @database_list parameter which looks like this:
;WITH n(n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY s1.name) - 1 FROM sys.objects AS s1 CROSS JOIN sys.objects AS s2 ) SELECT @dblist = REPLACE(REPLACE(REPLACE(x,'</x><x>',','), '</x>',''),'<x>','') FROM ( SELECT DISTINCT x = 'N''' + LTRIM(RTRIM(SUBSTRING( @database_list, n, CHARINDEX(',', @database_list + ',', n) - n))) + '''' FROM n WHERE n <= LEN(@database_list) AND SUBSTRING(',' + @database_list, n, 1) = ',' FOR XML PATH('') ) AS y(x);
This code took me a bit to grok as there is a lot going on here. Let’s break this down into smaller pieces:
- Starting on line 1 a Command Table Expression (CTE) is declared and given the alias “n”. This CTE just serves to create a temporary table full of sequential numbers from 0 to x where x is the total number of rows in the sys.objects table raised to the power of 2. On my local installation of SQL Server 2008 R2 I have 78 rows in sys.objects so this CTE results in a set of numbers from 0 to 6084.
- The SELECT statement that starts on line 11 is selecting rows out of the CTE for values up to and including the full length of the @database_list parameter. The select list and where clauses are using the SUBSTRING function to “chunk” the comma-separated @database_list parameter into pieces separated by commas. The result of this query is one row per distinct database name. Finally the FOR XML PATH(‘’) command takes that result set and flattens it into one big XML string. At this point the result for our example input would be: <x>N’databaseA'</x><x>N’databaseB'</x><x>N’databaseC'</x>
- Finally, the select list clause on lines 7-8 is taking that XML string and stripping out the <x></x> nodes to leave a comma separated string of database names.
This code is fairly complex and might seem like overkill for the task of simply converting ‘databaseA,databaseB,databaseC’ into ‘N’databaseA’,N’databaseB’,N’databaseC’, but it will gracefully handle things like duplicated database names, whitespace between entries, and leading/trailing commas. The biggest weakness of this code (aside from complexity) is the fact that it relies on the sys.objects view to create the crucial “numbers table” that it needs to parse out the database list. If the length of the @database_list parameter exceeds the number of rows returned from the CTE then you’ll end up losing some of the databases specified in the list. You’re probably thinking that 6084 rows in the numbers table should be more than enough for any @database_list parameter you might want to use, and you’d be right, but there are circumstances where you won’t get anywhere near 6084 rows from the CTE.
It turns out the number of rows you get from sys.objects can be affected pretty dramatically by the permissions of the user that is performing the query. I was seeing this issue when the query was being run by a user with restricted permissions which was causing the number of rows returned by sys.objects to be quite small. On my SQL Server 2008 R2 installation that I have my local workstation a “fresh” SQL login with no extra defined permissions only gets 6 rows from sys.objects in the master database. Having so few rows available in the CTE of the query above means that database names that fall into portions of the string that fall outside he bounds of the number range will never be included in the final query. In order to ensure that the @database_list parameter will behave properly we need an approach that will not be impacted by the permissions of the user running the query.
I came up with a few possible ways to work around the potential limitations of relying on sys.objects to build a number range:
- Require that users pass in a properly formatted database_list: The easiest way to deal with this issue as the author or editor of the stored procedure code is to put the burden on the caller to provide a list that’s in a format that can be consumed in the WHERE clause of the query against sys.databases so that you don’t need to do any kind of adjustment. The downside to this approach is that it’s not very user-friendly. As a consumer of the stored procedure I’d much rather just be able to pass in a simple comma-separated list of database names without having to worry too much about white space or adding the Unicode ‘N’ identifier to the beginning of each database name.
- Pick another system table or view: I’ve seen some people use the undocumented ‘spt_values’ table in the SQL Server master system database for having a SELECT-able range of consecutive numbers. I personally don’t really like this approach because it relies on an undocumented table that may or may not be around in future versions of SQL Server. Also, it’s using that undocumented table for a purpose that it was not originally designed or intended for.
- Build your own numbers table: A lot of SQL Server gurus would say that you should have a pre-built “numbers table” available in every SQL Server instance that you manage for preforming tasks just like the one we’re dealing with here. I think this post from Adam Machanic sums up the virtues of a numbers table pretty well: You REQUIRE a Numbers Table! I like the idea of having a pre-built numbers table available for stuff just like this. You create this table expressly for this purpose and just keep the script for building it around as part of your SQL Server instance setup tasks.
- Build a proper list of numbers on-the-fly: Rather than rely on some pre-existing table (whether it be one that you build or one that’s built-in) we could instead opt to build up the list of numbers that we need in a table variable or temporary table just before we need them.
While thinking about this problem I whittled the possible solutions down to #3 or #4 above. I think both of these approaches would work very well and would avoid building a dependency on using undocumented system tables in a way in which they were not intended to be used. To keep the dependencies of the stored procedure to the bare minimum, I ultimately went with option #4 and used a simple WHILE loop to build up a table variable containing the list of numbers I needed to properly parse and adjust the database_list parameter. You might feel a bit “icky” using a WHILE loop in t-sql, but I feel like the result performs pretty well, will always perform correctly, and is very easy to understand and read. Here’s how the code ended up:
--create a numbers table big enough to let us parse out the @database_list parameter DECLARE @n TABLE (n INT) DECLARE @loopCounter INT, @loopLength INT SET @loopCounter = 0 SET @loopLength = LEN(@database_list) WHILE @loopCounter < @loopLength BEGIN INSERT @n VALUES (@loopCounter) SET @loopCounter = @loopCounter +1 END SELECT @dblist = REPLACE(REPLACE(REPLACE(x,'</x><x>',','),'</x>',''),'<x>','') FROM ( SELECT DISTINCT x = 'N''' + LTRIM(RTRIM(SUBSTRING( @database_list, n, CHARINDEX(',', @database_list + ',', n) - n))) + '''' FROM @n WHERE n <= LEN(@database_list) AND SUBSTRING(',' + @database_list, n, 1) = ',' FOR XML PATH('') ) AS y(x);
This code could also very easily be changed to use a pre-built numbers table if desired. All you would have to do is get rid of the WHILE loop and swap out the name of the table variable for the name of the pre-built numbers table.