DaveWentzel.com All Things Data
Nested Views, Performance, and Predicate Pushdown
I wrote earlier in the week about nested views and performance. That post was focused around simplifying your views to follow a FROM –> INNER JOIN –> LEFT JOIN query pattern. As with everything in SQL, these aren’t hard and fast rules. Sometimes following that pattern may not produce optimal performance if the query optimizer is reducing the nested views too much and we do actually need the views to “materialize” or pipeline. Usually the underlying cause is still a poorly written query or even bad db design, but it’s important to understand how to get nested (or derived tables, or subqueries, etc) to pipeline without reducing.
First, What is Predicate Pushdown
I have more on Predicate Pushdown here. You don't hear this term much in the SQL Server world, it's more prevalent amongst Oracle folks, but it's just as relevant to SQL Server. When nesting views (or queries in general) in Oracle the predicate against the outermost query is "pushed down" to the nested views as far as possible to generate the most optimal EXPLAIN plan. Generally this is a good thing except when
- you don't use bind variables and the result is much hard parsing and predicate pushdown not working optimally
- the predicate being pushed down is not selective enough to result to allow nested loops to be used (where it would normally) and hash joins are done instead.
SQL Server won't have issues with 1, but 2 is always a possibility.
Let’s look at an example. I don’t have a repro script for you to test due to time, but follow along and the concept should be clear. I have a view that nests other views that has has the execution plan shown on the right. Note that at this step of the plan the query engine is processing 1.9 million rows. Rest assured it should only be processing about 42,000 rows at this point.
Note also that in the icons there is an orange circle with 2 arrows…that means sql thinks the query is so difficult it needs to span CPUs (parallelism) to get it completed. It should never do this for this query, it’s very simple. I’ll save you the details…here’s the join that is the problem…
LEFT OUTER JOIN MXW.ProjectCostCodeXCostType AS pccXct ON pcc.ProjectCostCodeID = pccXct.ProjectCostCodeID LEFT OUTER JOIN MXW.MasterFileAdminCostTypeSetup AS mfc on pccXct.CostTypeID = mfc.CostTypeID
The “mfc” view has no RIGHT JOINs, the problem is simply the nesting and the shear number of JOINs in the views. Initially I just pulled out the cols and JOINs I needed from mfc (about 7 tables out of 22), embedded them in this view, checked that performance was correct (it was down to under 2 secs now), and moved on. But that leaves us with duplicated code everywhere which isn’t wise.
The underlying problem is that other predicates were being pushed down unnecessarily, resulting in a suboptimal execution plan.
Here’s another solution that eliminates the unwanted predicate pushdown…you need to tell sql server to materialize the problem view first, then join that to the remaining query. So basically we want pipelining now where normally we would not. We can do this using temp tables in a stored proc, but you can’t use a temp table in a view. The solution is to force the JOIN early. How? Make sql server resolve it first by using a derived table and CROSS JOIN. Here’s the new query.
--LEFT OUTER JOIN MXW.MasterFileAdminCostTypeSetup AS mfc
-- on pccXct.CostTypeID = mfc.CostTypeID
--predicate pushdown for parallelism performance.
select v.* , dummy.nul
from MXW.MasterFileAdminCostTypeSetup v
cross join (select '' as nul) as dummy
where v.CompanyCode = (v.CompanyCode + dummy.nul)
on pccXct.CostTypeID = mfc.CostTypeID
The CROSS JOIN makes sql server resolve the entire derived table before it can proceed to join to pccxct, even though the CROSS JOIN functionally does nothing. That’s what we want. You may think that you don’t need the CROSS JOIN because the derived table in parenthesis will resolve first because of basic rules of arithmetic order of operations (do what’s in parens first). You’d be wrong. Without the CROSS JOIN the optimizer knows the query in parens is reducible to a standard LEFT JOIN. The CROSS JOIN is enough to fool it into not reducing the query.
Note the new query plan, the correct actual/estimated rowcount, lack of parallelism (the double arrows), and subsecond response time.
Occasionally SQL Server will generate a suboptimal query plan that can be traced back to the optimizer reducing a query that shouldn’t be or not honoring a derived table properly.