DaveWentzel.com            All Things Data

Predicate Pushdown

 

Many times we have a query (I'll work up a real example) that joins tables together, possibly having millions of rows.  The JOIN is very restrictive of both tables.  We then apply a WHERE predicate similar to
 
WHERE dbo.scalarfunction(tbla.Col1) = 'something'
 
Our execution plan will show the FILTER condition applied BEFORE the JOINs.  Obviously the query optimizer doesn't realize that the JOIN will restrict more rows than the function so it attempts to apply the function first. 
 
One simple solution is to put the result of everything BUT the scalar function into a temp table and then apply the function there. 
 
It seems like I'm always rewriting queries like this with very little success.  I usually simply try to use derived tables which show the "procedural logic" I want sql to use.  It never does b/c the optimizer is actually smart enough to realize that my rewritten query is functionally equivalent to the original and so provides the same plan.  Unfortunately I only ever seem to realize this after rewriting functionally equivalent queries for an hour or two.  I've even tried to use a CTE (in places where a temp table couldn't be used...say, in a view), but again, it seems like that darn optimizer just expands the CTE on me. 
 
Someday I'll learn. 
 
If you don't like the temp table trick we need to do "predicate pushdown".  We need to do *something* to the WHERE clause to make it evaluate *after* the JOINs. 
 
Here's one way that I think is self documenting. 
 
Add one last JOIN, a CROSS JOIN, like this
 
CROSS JOIN (SELECT '' AS nul) AS dummy
 
then change the WHERE clause to something like
 
WHERE dbo.scalarfunction(tbla.Col1 + dummy.nul) = 'something'
 
This is predicate pushdown in a nutshell.  We are telling the optimizer to not evaluate any of the WHERE clause until *all* JOINs are evaluated. 
 
There are other tricks that work as well.  For instance, assuming the same scenario as above, and assuming we are only returning a few rows and cols a GROUP BY will work too, applied after the JOIN logic is placed in a derived table block, which is then in a derived block that we can apply the WHERE to. (I'm thinking this is less "self documenting", not to mention the potential performance issues...I'd rather just use a temp table).   
 
Something like this...
 
SELECT
FROM (SELECT ...FROM GROUP BY ) AS tbla
WHERE dbo.scalarfunction(tbla.Col1) = 'something'
 
 
 
 
 
 

Add new comment