The Wentzels website is a collection of my opinions and findings on technology, especially SQL Server. I also occassionally mention politics and economics, I am an anarcho-capitalistic (die-hard libertarian) in the Rothbard tradition.
I also run my LLC from this website and have sections devoted to showcasing specific skills that I have (ipod/ipad development, data architecture, SQL, Oracle, and Drupal). My latest creation is www.tigersbaseball.info which showcases an automated baseball scoring and statistics generation system that uses Drupal. Please bear in mind that I have no artistic capabilities so everything I create tends to look terrible, the content is the key.
Hopefully you never have to deal with determining the real procedure that is being called under-the-covers with Prepared Execution and sp_execute. If you don't understand the issues then this is a pretty good post to get you started. I've also written about this in detail many years ago. For standards reasons it seems like more JDBC drivers are making prepared execution the default (or only) method of making stored procedure calls.
The issue is what do you do when Support Guy calls you up and says, "I see 'sp_execute 70,param,param' is performing horribly in Profiler, can you fix it?" If the problem is already manifest at your customer sites you have a real problem. You really should have caught this in your performance tests, but hopefully even earlier when you defined your supportability goals.
Possible fixes:
I'll expand upon all of these possible solutions in the future.
Whenever I'm writing or executing scripts that might be destructive I of course wrap my code in appropriate transaction control blocks. However, transaction control blocks (BEGIN TRAN/ROLLBACK/COMMIT) may not always be possible if you are running code across .sql files, manipulating foreign keys, setting up replication, needing a custom grain of transaction control, etc. In these cases Database Snapshots work great.
A database snapshot is kinda like a live, queryable copy of a database that is transactionally consistent as of when the snapshot is created. The purpose of this post is not to get into the guts of how database snapshots work. One of the nice features of a database snapshot is that you can RESTORE your live database back to the snapshot with one simple RESTORE command. You probably woouldn't do this on a production database, but it is valuable for dev databases.
However, MS, in my opinion, could've made the CREATE syntax for database snapshots a little bit easier. The big problem is if your database has more than a few db files you have a rather lengthy CREATE DATABASE statement. This stored procedure simplifies the process. You pass it @SnapshotName and it builds the code for you to create a database snapshot. It assumes your "sparse" files will live in the same directory structures as the underlying files. It then creates a RESTORE command that you can run later, if needed, to revert back to your snapshot.

The script will generate output similar to the following...

I use this script all the time in my dev environments. I hope you find it useful too.
It's common for people to want to index all of their foreign key columns. In general this is probably a good idea and is well documented by many reliable sources. However, there are quite a few cases where an index on a foreign key column probably is not useful.
Assume your database has a series of tables with standard "auditing" columns (CreateUsrID, CreateDate, LastUpdateUsrID, LastUpdateDate) where the UsrID columns refer back to a Usr table (UsrID).

This is a common design pattern. In most cases I've seen foreign keys created on CreateUsrID and LastUpdateUsrID that refer back to Usr(UsrID). DRI in this case is probably a good idea. But are indexes on CreateUsrId and LastUpdateUsrID really needed? Indexes probably won't be needed in this design if:
So there you have it. If you aren't traversing your model from parent to child rows or using cascading DRI, then an index on a foreign key column probably won't help you.
The WITH NO_LOG and TRUNCATE_ONLY options of BACKUP LOG were deprecated with SQL 2008. The rationale is that if you really want to use these options just switch your database to SIMPLE recovery mode. It's a good solution, but lots of people have deployed lots of scripts that expect those options to be available in a worst case scenario (such as totally out of disk space). Furthermore, changing your recovery mode will break your log chain.
Another place where people have used these deprecated options is development envs. If you want your scripts to be as close to model as possible (ie, you do log backups in both PROD and DEV, but you don't actually keep your DEV log backups) then you really don't want some envs in SIMPLE and some in FULL recovery mode. I've seen lots of scripts that read metadata to determine if NO_LOG or TRUNCATE_ONLY should be used.
Frankly I've never used NO_LOG or TRUNCATE_ONLY, there's a much better solution that avoids cryptic SQL 2008 error messages and accomplishes the exact same thing...use the NULL file. In Unix operating systems if you send something to the file called /dev/null essentially the data is discarded and the operation reports success. This trick works in SQL Server as well.


Problem solved.
I hate Deferred Name Resolution in SQL Server mostly because it is inconsistent. If you don't know what DNR is, see this article. In a nutshell, I can create a stored proc that references a non-existent table, but I can't reference a non-existent column in an existing table. Here's a simple proof:
USE tempdb
IF NOT EXISTS (select * from sysobjects where id = object_id('Foo'))
PRINT 'Foo does not exist, OK'
GO
CREATE PROC BAR AS SELECT ID FROM Foo
GO
SELECT name, type FROM sysobjects WHERE name = 'BAR' and type = 'P'
GO
DROP PROC BAR
GO
And the output...

...confirms that I could create a proc referencing a non-existent table. But now let's create the table first but reference a non-existent column:
USE tempdb
CREATE TABLE FOO (colA varchar(100))
GO
CREATE PROC BAR AS SELECT ID FROM Foo
GO
SELECT name, type FROM sysobjects WHERE name = 'BAR' and type = 'P'
GO
DROP TABLE FOO
GO
DROP PROC BAR
GO
And the screenshot shows that we failed.

It seems ridiculous to me that I can create a proc referencing a non-existent table, but can't reference a table with a non-existent column. You may be saying, "So what? Why would I ever want to reference a non-existent column?"
Code Reusability and Temp Tables
...that's why. It's good OO programming design to not copy/paste code everywhere. Unfortunately in TSQL we often have to do that if we expect performance to be acceptable. Many people take their copy/paste logic and insert into into a scalar UDF. That won't work if you want peformant code since the scalar UDF is processed in RBAR (row by agonizing row) fashion. An inline table valued function will work, the optimizer will treat the TVF as a kind of "macro" and will expand the code in a performant manner. But there are limitations to this approach that make it's use very limited.
The best approach, by far, in my opinion, is to pass around temp tables into stored procs that are in essence the TVF code, or code you don't want to copy/paste everywhere. So, let's assume you have a complex calculation that you run over a series of table ids in various procedures that you would like to reuse in a performant manner.
The whole process should look something like this:
USE tempdb
GO
CREATE PROC Utility AS
BEGIN
UPDATE #tblUtility SET calc = round(PI(),4)* ID --you're calculation logic goes here
WHERE ID % 2 = 0
END;
GO
CREATE PROC Caller AS
BEGIN
--I'm building my list of IDs that I need into my temp table
CREATE TABLE #tblUtility (ID int, calc decimal(13,4))
INSERT INTO #tblUtility SELECT TOP 5 object_id, NULL FROM master.sys.objects
EXEC Utility
SELECT * from #tblUtility
END
GO
EXEC Caller
GO
DROP Proc Utility
GO
DROP Proc Caller
GO

Very simple and elegant. The problem comes in where I want to do a lot of calculations in my Utility procedure, hence I need a lot of columns, let's assume we need 20 columns declared. We now have to copy/paste those 20 column declarations into every Caller procedure. That's redundant code. The simple solution is to only declare the basic table in Caller with just the ID col, then ALTER the table in the Utility procedure to ADD the calculation columns. That's much less copy/paste code. And much less work when we need to change or add new cols in the future. We don't need 20 new columns to see how the improvement would work. Using our example, this should be much easier to maintain:
USE tempdb
GO
CREATE PROC Utility AS
BEGIN
ALTER TABLE #tblUtility ADD calc decimal(13,4) --note that any calc cols are created here
UPDATE #tblUtility SET calc = round(PI(),4)* ID --you're calculation logic goes here
WHERE ID % 2 = 0
END;
GO
CREATE PROC Caller AS
BEGIN
--I'm building my list of IDs that I need into my temp table
CREATE TABLE #tblUtility (ID int) --note only the ID col is needed here
INSERT INTO #tblUtility SELECT TOP 5 object_id FROM master.sys.objects
EXEC Utility
SELECT * from #tblUtility
END
GO
EXEC Caller
GO
DROP Proc Utility
GO
DROP Proc Caller
GO
But it doesn't work, thanks to our old friend Deferred Name Resolution.

Essentially this is telling us that the UPDATE cannot occur because a column does not exist in an existing table. Two solutions come to mind:
Assuming we go with Option 2, this is what the new Code Reusability Pattern would look like:
USE tempdb
GO
CREATE PROC Utility AS
BEGIN
ALTER TABLE #tblUtility ADD calc decimal(13,4) --note that any calc cols are created here, in the first level util proc
EXEC Utility_SubProc_Calculations
END;
GO
CREATE PROC Utility_SubProc_Calculations AS
BEGIN
UPDATE #tblUtility SET calc = round(PI(),4)* ID --you're calculation logic goes here, in the subproc
WHERE ID % 2 = 0
END;
GO
CREATE PROC Caller AS
BEGIN
--I'm building my list of IDs that I need into my temp table
CREATE TABLE #tblUtility (ID int) --note only the ID col is needed here
INSERT INTO #tblUtility SELECT TOP 5 object_id FROM master.sys.objects
EXEC Utility
SELECT * from #tblUtility
END
GO
EXEC Caller
GO
DROP Proc Utility
GO
DROP Proc Utility_SubProc_Calculations
GO
DROP Proc Caller
GO

Pattern Summary
The key to performant TSQL code reusability, in my opinion, is utility procedures. To get the most benefit we need to create an "extra" layer of abstraction to handle the god-awful Deferred Name Resolution implementation of TSQL. You need to :
With this simple API pattern we overcome both DNR issues and Performant TSQL Code Reuse. More Tips on Writing Better Stored Procedures
Mutexes in SQL Server can be very useful. First, you may be wondering what a mutex is. Mutex is the shortened form of "mutual exclusion" and is an algorithm used in programming to prevent the concurrent use of a shared resource. Generally a SQL developer is trying to make a database more "concurrent", not less, meaning we are looking for ways to prevent, say, a table (or data) from serializing. But in many cases a critical object in the database can only be accessed by one process or thread at a time (single-threaded). The classic example of where this is needed in a database is a queue table, which I've written about previously. There are lots of types of mutexes (semaphores, message passing, simple locking, etc).
But you may have other needs for mutexes. An example is a aggregate or summary table. In this case an aggregate table stores aggregates of numeric data derived from lots of detail tables. Somehow we need to keep the aggregate table up-to-date as changes are trickling in to the detail tables. Methods to do this include:
Each of these mechanisms has its Pros and Cons.
Another solution is to use a true mutex in SQL Server by using sp_getapplock and sp_releaselock. A great use of these mutex devices is to create data caches in SQL Server. Let's assume you need an aggregate table, but maintaining that data from the detail tables is very expensive relative to the frequency that the data is being read. But let's say when the aggregate data is read (such as when a report is run) that many reads will occur (because many reports that use the data run simultaneously.
And that is the exact scenario where I have used this. Some background...we have a series of Reporting Services reports that need to aggregate data a particular way. We display these reports "dashboard-style" meaning there are quite a few charts that display on the page, which is customizable, and each chart is a separate report. SSRS runs these reports in parallel, not serially. Further, although these are separate reports they each tend to utilize the same aggregate data. So we want to maintain a data cache on the db server where each report can share the data. Since SSRS does not guarantee which report will run in what order, and because the dashboard can be customized to include/exclude reports, we need a generic way to determine who builds the cache and to ensure the cache doesn't get poisoned from either another report or another user running the dashboard.
| IF @result NOT IN ( 0, 1 ) -- Only successful return codes BEGIN PRINT @result RAISERROR ( 'Lock failed to acquire for Proj Aggregate table population.', 16, 1 ) END ELSE BEGIN --mutex acquired --cache ageing...we allow 1 min DECLARE @CacheAge DATETIME = DATEADD(mi,-1,GETDATE()) --take this opportunity to purge the DUMMY table of anything older than 1 min DELETE FROM dbo.ProjectCache WHERE CacheInsertDateTime < @CacheAge IF EXISTS ( SELECT input.ProjectID ,pam.ProjectID FROM #ProjectIDList input LEFT JOIN dbo.ProjectCache pam ON input.ProjectID = pam.ProjectID --natural key AND pam.CacheInsertDateTime > @CacheAge --cache has to be within age WHERE pam.ProjectID IS NULL --if we are missing cache rows for the ProjectID, fail it. ) BEGIN --cache is stale or Project requested does not live in cache BEGIN INSERT INTO dbo.ProjectCache SELECT input.ProjectID ... FROM ... END END --get the data SELECT * FROM dbo.ProjectCache EXEC @result = sp_releaseapplock @Resource = 'Proj Aggregate table population' END --mutex acquired COMMIT END | Here we check to see that we obtained a lock. In no case should this ever fail, it will simply wait to obtain the mutex, which is an exclusive lock on the data cache table.
Now that I have my mutex I want to see if the data cache is stale and if so purge it.
Since each report has a set of params we need to make sure our cache has those params (ProjectID).
Here we rebuild the cache.
And finally we return the data from the cache as requested, release the app lock, and commit the transaction. |
Assume now that a split second after the first user enters the procedure a second user makes the same request. And then a third user makes the same request a second after that. Here is the order of events.
| User One | User Two | User Three |
|---|---|---|
Procedure is called
| ||
Procedure is called
| ||
Procedure is called
| ||
| Waiting on Procedure to build data cache... | ||
| Waiting on Procedure to build data cache... | ||
| Waiting on Procedure to build data cache... | ||
| Procedure finally completes. Data cache is rebuilt. Data is returned to the client and mutex is released. | ||
Mutex is granted.
| ||
Mutex is granted.
|
The presence of an Eager Spool in a SQL Server query execution plan should always be concerning. Essentially this is telling you that the optimizer wants a read-consistent view of the data, which might be normal if seen in the query plan for an UPDATE statement, but should raise an eyebrow in most other cases. More importantly, an Eager Spool is a blocking operator, which means that "branch" of the query plan will be "blocked" from continuing until the Eager Spool is satisfied.
I struggled to come up with a repro script to reproduce an Eager Spool that would show the undesirable side effects of an Eager Spool so I decided to just show you a real world example that I found in the past and documented. The "setup" is a large stored proc with lots of "optional parameters" which are performance-draining to start with. Here is the code:
AND (
(
@EncProvObjIds is not null
AND EXISTS ( SELECT 1 FROM dbo.Enc
WHERE RcvGrp.EncObjId = Enc.ObjId
AND Enc.EncProvObjId in (SELECT EncProvObjIdsTable.ObjId FROM @EncProvObjIdsTable EncProvObjIdsTable) )
)
OR @EncProvObjIds is null
)
Here we see that we only want to evaluate the EXISTS clause if we have valued @EncProvObjIds. This is a standard "optional parameter" pattern that I've seen with almost every large database application I've worked with. Here is the query plan for just this section of code:

That indicates the Index Scan on EncEncProvFX5 (an index on a HUGE table) is occurring before it even checks if @EncProvObjIds was valued!!! The Eager Spool indicates that that branch of the query "blocks" until the condition is confirmed, since it's the right most operator of that branch it's scanning everything in the index/table. It is also attempting to get a read-consistent view of the data (else it would be Lazy Spool). No wonder it causes lock escalations...it's scanning a whole index when it doesn't have to, if it would just read the damn param first!!!! I tried a few things...@EncProvObjIds to #TempTable, OPTION RECOMPILE, passing in lots of values in the optional parameter vs no values, setting the param to a local variable to avoid parameter sniffing, even adding an OPTIMIZE FOR clause...nothing worked, always EAGER SPOOL.
I eventually rewrote it like this...
JOIN dbo.Enc
ON RcvGrp.EncObjId = Enc.ObjId
LEFT JOIN #EncProvObjIdsTable EncProvObjIdsTable
ON Enc.EncProvObjId = EncProvObjIdsTable.ObjId
WHERE COALESCE(EncProvObjIdsTable.ObjId,'') = CASE WHEN @EncProvObjIds IS NOT NULL THEN Enc.EncProvObjId ELSE COALESCE(EncProvObjIdsTable.ObjId,'') END
...which generated this query plan:

The only way to get the Eager Spool to disappear was by converting the EXISTS to a standard JOIN. Granted, this *trick* may not work in every situation, in fact, that may be a given. The fix depends on your data and the cardinality of the child table. Regardless, the solution is not what is important here, it is important to understand what an Eager Spool is and how performance-draining they can be. BTW, performance increased 2 orders of magnitude with JUST THIS CHANGE.
Wouldn't it be nice to create stored procedures that had a "debug" or "Print Only" mode. In my last post I began to outline a framework I use to do this easily. I covered the basic changes needed to your existing procedures. In this post I'll cover the two helper procedures you will need and why.
There are two types of SQL statements you will need to concern yourself with..."executing" and "non-executing" statements. Here is a matrix that will show you how our framework will handle these statements:
| Type of Statement | Example | How It Should be Handled in Execution Mode | How It Should be Handled in Print Mode |
|---|---|---|---|
| Executing | EXEC sp_who2 | Execute the given command | The command should be PRINT'd exactly as it was entered. |
| Non-executing | --this is a comment | it should be ignored, said differently, it should be treated as a comment. | PRINT 'this is a comment' |
dbo.StringBuilder
To do all of this work we are going to need a StringBuilder routine. Here is the code:
CREATE FUNCTION dbo.StringBuilder | Our StringBuilder function takes @String as a parameter.
We search for "--" which indicates a non-executing statement. We remove the "--" in favor of a PRINT statement. We will also need to add single quotes around the statement.
The 2147483600 is "almost" the exact size of an INT. We add a CRLF for readability to every line. I like just char(13), but char(10) + char(13) is also viable. |
String Building Pattern
We need to change the guts of our "PrintOnly" procedure and change "executing" statements to look something like this:
SELECT @PrintMsg = @PrintMsg + dbo.StringBuilder ('EXEC sp_addpublication_snapshot ')
And "non-executing" statements need to look like this:
SELECT @PrintMsg = @PrintMsg + dbo.StringBuilder ('--Creating snapshot agent for Pub: ' + @PublicationName)
We are simply doing standard string concatenation and StringBuilder is going to handle the formatting for us.
Note again that "--" will be replaced with a PRINT statement in PrintOnly mode.
In the next post I'll cover some goofiness regarding handling NULLs and escaping single quotes easily.
Have you ever had the need to create complex stored procedures that execute lots of other TSQL commands, yet you were scared of executing them directly due to some destructive nature of the procedure? Wouldn't it be nice to be able to write stored procedures that had a flag like @PrintOnlyMode where it would PRINT the commands it would normally execute, without executing them. Those PRINT statements could then be copy/pasted into another SSMS window and executed, after verifying their safety?
I recently had a need to do just this. We had to build a routine that looked to a custom application metadata table to determine if replication needed to be installed, and if it was installed, was it "up-to-date" with all of the latest pubs/articles/subscribers required for our application to function properly. With replication I knew I wouldn't have a high degree of confidence that a single procedure called, for instance, "dbo.Replication", would be able to do all of these things, even if I wrote it.
So I created a mechanism where I can either EXECUTE the commands directly (the default behavior), else PRINT the commands to the console to be double-checked by a human and executed manually later. Surprisingly this is harder to do than you might think. But a little framework and some helper procedures make the code easier to work with. Here is basic framework:
Add a Parameter to Turn On/Off PrintOnly Mode
,@PrintOnly bit = 0
This is the mechanism to turn on/off PrintOnly Mode. The default will be "execution" mode.
Add an OUTPUT Parameter to Your Stored Proc
,@PrintMsg varchar(max) = '' OUTPUT
This is optional. @PrintMsg will be needed, but it can be a standard variable. @PrintMsg will contain the string we either execute or PRINT. The benefit of making this an OUTPUT parameter is that we can write unit tests (I love TSQLT for this) where we capture the output and execute it to ensure our PRINT statements work as advertised.
Add a PRINT/EXECUTE Block at the End of Your Proc
Done:
IF @PrintOnly = 0
BEGIN
EXEC (@PrintMsg)
END
ELSE
BEGIN
PRINT (@PrintMsg)
END
Your proc will be building a string...you need to determine if you want to PRINT or EXECUTE.
So far, these are the simple steps. Now comes the complicated portions...building strings that can be PRINT'd or EXEC'd. I'll cover that in the next post.
Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer