slideshow 1

Welcome!
I am an independent data consultant. Here you'll find a collection of my thoughts, mostly on data architecture and SQL Server. The links above will take you to my blog, my open source projects, and presentations I have delivered that you may be interested in.

Data is one of your company's most valuable assets. I can help you make the most of it.

slideshow 3 slideshow 2

What are the benefits of using MD3 over my existing solution?

This will be the last post (for a while) in my MD3 blog series.  I'm going to wrap up with a post on the benefits of using MD3 over other DDL deployment solutions.  

Embodies Free and Open Source Software Principles

  • There is no third party software to deploy or license.  
  • It's completely free and open source (MIT license).  You can look at the code and figure out exactly what will happen when you run it.  Now look at how Visual Studio database projects deploy...there is almost no documentation on how the engine makes its decisions as to what DDL will be run.  So, how do I make money off of MD3?  Services.  I've deployed MD3 for 6 different ISVs...huge ISVs...and they love it.  One ISV has about 100 customers with databases as large as 10TB that MD3 deploys with almost zero downtime.  Another ISV runs over 4000 customers on tiny SQL Express databases...yet they have NO support personnel to handle database upgrades...because the upgrades NEVER fail.  
  • Quick Links
    Introduction to Metadata Driven Database Deployments
    Download MD3 (with samples)
    Documentation on CodePlex/git
    Presentation on MD3
    Other Posts about MD3
    AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

    It follows the Unix principle of "do one thing and do it well."  MD3 is a bunch of really small, separate components.  Let's say you only want to use MD3 to deploy system/model/lookup data...you can do that without deploying any other component.  

  • Runs any TSQL.  It requires no special, proprietary syntax or formatting.  This makes your code portable if you decide later that MD3 isn't working for you.  This also allows you to tranfer your existing custom .sql scripts in to MD3 with almost no effort.  

Extensibility

  •  Follows OO design principles.  Use only what you need, or use everything as designed, or add new features as you need them.  
  • If you need to customize it to fit your unique requirements you can.  MD3 Extensibility Stories
  • Although I haven't open sourced it, MD3 is totally portable to Oracle.  Procedures like MD3.CreateIndex have a counterpart written in PL/SQL that runs on Oracle.  Obviously properties like "fillfactors" and "clustered indexes" do not exist in Oracle, which has similar properties like "Index-organized tables" and "PCTFREE".  In theory, all MD3 components can be ported to any DBMS platform.  Under-the-covers everything is just SQL, DDL, and The Four Rules.  This allows your application to be DBMS-agnostic.  
  • There is also a component that uses MD3 to generate Zero Downtime Database Upgrades.  Please contact me for more information.  

Feature-rich

  • Handles the deployment of model/system/lookup data without complex, hand-crafted scripts.  No other tool on the market does this.  None.  And this is infinitely flexible and handles any model data deployment scenario that you can think of.  Please see MD3 Model Data Patterns.
  • MD3 will completely reverse-engineer your schema into a format compatible with MD3 (which is really just SQL).  It will even reverse-engineer your model data. How to Reverse-Engineer Your Database with MD3

Customizable and geared toward DevOps data professionals

  • Uses your existing TSQL development tools.  It works equally well with SSMS, VS, TOAD, dbeaver, or whatever other tool you might prefer.  There is no need to context switch to another tool to use MD3.  
  • There is no GUI because a GUI isn't necessary.  You can visualize your deployment by looking at the folder structure of the database deployment scripts.  It's easy to visualize the exact order your stateful database objects will deploy in. 
  • There is nothing written in Python, Perl, Ruby, or whatever, which becomes another language you have to learn to support the tool.  MD3 has a small driver written in Powershell (less than 100 lines of code) that simply recursively executes each script in each folder.  But the driver could be just a simple batch file, some vbscript, or you could integrate it into your existing tool.  
  • Infinite customization.  Do you need to backup your db prior to your deployment?  Just add a script in the appropriate pre-deployment folder.  Do you need to stop or start replication?  Put the necessary sql script in the right location.  Do you need customer-specific versions of scripts?  Just put the correct MD3 calls in the correct folder.  

DBA-friendly

  • Written entirely in T-SQL (with a tad of Powershell, which is not required).  No need to learn another language/tool.  
  • Does full logging to both a table (MD3.Log), the console, or pipe the output to a file.  MD3 Deployments in Action
  • Need some special logging?  Just add a PRINT statement that logs whatever you need, wherever you need it in your code.  
  • MD3 will fail on the first error and is totally rerunnable without having to restore your database.  
  • Errors can be converted to WARNINGs which can be logged for later analysis.  This is useful for warning when a foreign key is not trusted.  You might want to fix those untrusted keys later, but you don't want to fail your deployment.  

Helps enforce good software development practices

  • You can create a CI Build Loop for your database very quickly with whatever tool you want...ant, cruise control, jenkins, msbuild, etc.  Continuous Integration Testing with MD3
  • MD3 has built in "validators" that compare a "net new" and an "upgraded" database to ensure they are schematically identical.  It will even validate your system/model/lookup data.  This is the key piece of any database CI Build Loop.  
  • Works with any version control system without the need for any plugins.  
  • Do your own release numbering (MD3 doesn't care).  

You have just read "What are the benefits of using MD3 over my existing solution?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Extensibility Stories

Over the last several posts we've covered how MD3 handles metadata-driven, properties-based database deployments.  Perhaps you still aren't convinced that a declarative approach to your deployments is worth the effort.  Today I'm going to cover how I've evolved the MD3 system over the years to overcome problems that would have been much harder to solve if I used hand-crafted DDL or a GUI-based "database compare" tool instead of MD3.  Hopefully this post will give you some ideas regarding how you can use a properties-based deployment mindset to make your database deployments better, even if you don't decide to use MD3.   

Adding support for DATA_COMPRESSION to MD3.CreateIndex

I consulted for a huge ISV with ~60 customers running multi-terabyte databases.  When we (finally) certified our application to run on SQL Server 2008 it wasn't long before our customers asked us to consider supporting DATA_COMPRESSION on our indexes.  I did some testing and determined that ROW compression was an awesome new feature.  In no case did it perform worse for us and in most cases not only did we save $$$ on storage costs but we were able to force more data into buffer cache resulting in faster query runtimes.  We decided that the default compression setting for any NEW indexes would be ROW.  So I modified MD3.CreateIndex to support adding ROW to all new indexes unless requested otherwise.  

It also made a ton of sense to migrate all of our huge existing indexes to ROW as well.  But there are some side-effects with DATA_COMPRESSION  that made this difficult:  

  1. If we ran ALTER INDEX...DATA_COMPRESSION=ROW on our large indexes this would cause a lot of downtime that we couldn't afford.  
  2. We could run these commands with ONLINE=ON but there are some use cases where SQL Server does not allow an index to be rebuilt ONLINE.  For instance, the clustered index cannot be rebuilt with ONLINE=ON if the underlying table has a column with a LOB data type.  
  3. If your table is replicated you may see some goofy issues regarding how your publications and articles are scripted and snapshotted.  This can cause excessive downtime for your subscribers if you don't handle this according to your requirements.  It also causes excessive log reader latency and lots of tran log usage.  Sometimes it's better to rebuild them offline, if possible.  

So we determined that we needed the following requirements:  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)
  1. By default, all NEW indexes should have ROW compression.  
  2. EXISTING indexes on replicated tables would NOT have ROW compression enabled by our scripts to avoid Problem 3 above.  However, if a customer wanted to change this value given the caveats in Problem 3 our scripts would honor those changes.  
  3. There were a few EXISTING indexes on replicated tables where the benefits of compression outweighed the costs of Problem 3.  In that case we had an OVERRIDE flag where we could override Requirement 2.  
  4. All EXISTING indexes would be "queued" using SB to have their compression settings changed off-hours on a rolling schedule.  We didn't want to extend database upgrade downtimes for this feature, even if we did ONLINE=ON rebuilds.  
  5. Since these requirements were really complex their was fear among my colleagues and management.  We added an MD3 configuration parameter to quickly disable teh "data compression" code if it was found to be problematic.  

Imagine having to code all of those rules into every index DDL script you have.  Very daunting and error-prone.  With MD3 this was a few days of coding and quite a few days of testing, mostly testing the effects on replication.  We automated as many tests as we could with tsqlt.  We actually didn't have to touch any of the individual MD3 index scripts, merely the code for MD3.CreateIndex.  Although I coded and tested Requirement 5 (the on/off switch) we never had to use it because we had CIT in place testing various customers' databases in our performance lab and we were able to work out the implementation bugs well before the first production deployment.  

I haven't open-sourced any of this "custom" data compression code because it's frankly confusing and very specific to this ISV's implementation.  However, MD3 does support data compression as a standard property.  

A custom FILLFACTOR

MD3.CreateIndex fully supports FILLFACTOR.  I worked for a large ISV that implemented MD3 and most of their indexes used a FILLFACTOR of 100, with a few carefully-tested exceptions.  Quick review, MD3.CreateIndex takes @FillFactor as a parameter.  If the @FillFactor does not match the current index's fillfactor then that is detected as a change and the index is rebuilt so the index matches the requested properties.  

Our release's scripts were tested using CIT on various customer databases and worked flawlessly.  But our largest customer had a HUGE database, so huge that they couldn't afford (read: didn't want to pay for) a non-prod FULL COPY of their database.  They instead had some custom scripts that copied all master tables to their staging environments and then a "representative" set of recent transactional data.  These scripts were very intense to ensure that they could copy data to staging and still maintain relational fidelity.  Frankly, I always thought this process would blow up, but it always seemed to work.  Their staging database was less than 1% the size of their prod db using this process.  

Their staging db upgraded to the next release perfectly and we estimated that applying those MD3 scripts to their HUGE prod db would take about an hour of downtime (we did a "deferred deployment", discussed below).  Using MD3 I never got anxious during a customer upgrade event because I knew our scripts were vigorously tested.  However, that was about to change.  

At 3AM on the Saturday morning of the upgrade event I awoke to my cell phone ringing.  This customer's MD3 upgrade was running for 6 hours and appeared to be rebuilding EVERY index in their database.  Unfortunately I had a few adult beverages the night before so it took me some time to get ready for a bridge call to discuss the problem.  When I looked at MD3.Log I noticed that, indeed, every index was being rebuilt due to a property change...the fillfactors changed.  I queried sys.indexes and saw that each of their not-yet-upgraded indexes had its FillFactor set to 90.  Huh?  

It turns out that over the past year the customer's DBAs decided to change the FillFactors on ALMOST EVERY index in the database to 90.  It took them a year to do this, running a few indexes every night.  Now MD3 was undoing all of this work.  

People were panicking.  Our engineers were pointing fingers at the customer..."why did you change our indexes?".  The customer DBAs were blaming us for not properly testing our upgrade.  Plans were being discussed to roll back the database and redeploy the previous release's JVMs and try the upgrade again later.  

Meanwhile, this wasn't a problem for MD3.  While tempers flared I stopped the deployment and I modified 3 lines of code in MD3.CreateIndex to ignore FILLFACTOR differences when examining existing index properties.  I did a couple of quick tsqlt tests and told everyone on the call that I was ready to restart the deployment.  I was calm and confident, which caused everyone else to calm down.  We restarted the deployment, which completed this time in about 20 minutes.  

Notice how easy it was to overcome a potentially devastating problem with MD3.  That would be much harder to do with other deployment tools.  

As a side note...this customer's "custom fillfactors" spawned talks with them regarding why they decided to alter our fillfactors.  After we looked at their fragmentation statistics and we thought through our application data access we decided that for many tables a fillfactor of 90, instead of 100, was indeed warranted.  So we slowly implemented this change for every customer over the next few releases.  

Another MD3 win.  

Daily Software Releases to Production

When I started consulting at one particular ISV they released new versions of their software every 18 months.  Very anti-agile for a company that claimed to be agile.  There are so many problems with 18 month release cycles and I don't want to get off on a tangent but the biggest problem is that when a release comes so infrequently it leads to a mentality of only testing the release deployment in the last few months of the cycle.  This is one of those cases where "waterfall" development really does suck.  

Suffice it to say that the release deployment was rarely tested properly and customers cringed whenever a new release was deployed.  These events always caused excessive downtime and custom DDL to get the deployment to succeed.  We had customers that flat-out refused to upgrade their application because of this.  One customer was running a 5 year old version of our software running on an unsupported release of SQL Server.  During user group conferences the biggest customer complaint was botched upgrade events.  We asked for beta testers for each release but rarely found any willing customers.  Regulatory changes forced customers to upgrade far more than the desire for new feature functionality or bug fixes.  

We had to change these mentalities.  MD3 lends itself to proper CIT (see nodetitle) and repeatable deployments with less hand-crafted DDL.  The first release that utilized MD3 was a complete success.  Our sole beta customer raved about the fact that we met our downtime goal (by HOURS) and we had no "Day One Impacts".  Then our "early adopters" reported similar successes.  We managed to deploy that first MD3 release without any significant issues for any customers.  

Morale improved...customer morale, developer morale, support morale.  

The next release had 3 customers volunteer to be beta testers.  Again, no major issues.  The upgrade experience changed so much that customers were now requesting more frequent releases to get bugs fixed and new functionality quicker.  And we delivered.  I won't lie and say there were no issues and customers were completely happy with our software, but deployments were no longer their top complaint...not even in the Top Ten.  

We even had one customer where we released new versions of our software on a weekly basis.  NO LIE.  The customer realized, as we did, that more frequent releases actually manages risk better than infrequent, monolithic deployments.  With good automation in place it was no more difficult to deploy to that customer than it was to deploy to a QA environment.  Just run the deployment.  

I firmly believe that we could've delivered daily releases.  That may sound radical but using MD3 we automated database deployments, CIT, and unit testing.  It was common for a bug fix to be properly QA'd and full regression testing done on the same day it was fixed.  

DevOps and NoOps

DevOps seems to be the new thing.  DevOps is a software development method where your developers and Ops Guys share responsibilities and work together.  MD3 encourages this.  In some shops DBAs may find a perf problem and immediately apply an index with hand-crafted DDL (or via the SSMS GUI).  MD3 syntax is so easy that this can be done and properly checked in to the next build by an Ops Guy.  Some DBAs don't see the value in source control for day-to-day DBA tasks.  I've seen this mentality change first hand with the introduction of MD3.  I don't see this happen using other deployment tools where the DBA may not have a license or access to the tool (Visual Studio database projects come to mind).  

As a data architect I can often guess at what indexes a new table may need during the design phase of a feature.  However, I don't overthink indexes anymore with MD3.  Instead I make my best guess and then let our performance engineers and DBAs monitor the system (missing index DMVs, query plan analysis, etc) and make their own index changes by modifying the MD3 index scripts.  This makes everyone much more productive.  Index selection, IMHO, should be the domain of data professionals that understand indexing, not by Java developers who create a table for simple persistence but forget to apply a PK or a clustered index.  Instead, that job can be passed off to an expert.  

For a while our team was a true DevOps team with data developers, architects, modelers, perf engineers, and DBAs.  We touted it and loved it.  But we realized over time that we were becoming a NoOps team.  There was less operational work to be done as we automated things properly with tools like MD3 that are easy to automate.  We still had DBAs focused on replication, clustering, HADR, backups, etc.  But they spent less time on support tasks as we introduced more automation, usually with MD3.  For instance, I have an MD3 module that covers general replication tasks that is fully automated.  For instance, a simple MD3 command exists to add an article to a publication if it doesn't exist.  

Don't worry, we didn't fire our Ops Guys.  Instead they embraced these changes and became Dev Guys focused on automating operational tasks and lending operational-domain expertise to Dev Guys.  Our product improved because our ilities (non-functional requirements)...like supportability...improved.  Our Dev Guys now started to question their decisions and think more about the future supportability of the product they were building.  

Here's an example.  I see a lot of shops that automate builds and non-prod deployments using tools like ant and Jenkins.  That's great.  But when it comes time for the production deployment they revert back to manual deployments done by DBAs and Ops Guys.  This makes no sense.  "Prod" is just another environment that should be deployed to using the same tooling as any other environment.  The usual argument for this is that a prod deployment contains extra steps like backups, notifications, replication teardown/setup, user smoke tests, etc.  But those arguments are flawed.  Those items CAN and SHOULD be automated and should be performed and tested in the non-prod environments as well.  The goal is repeatable processes.  

Once you get to this level you are no longer a DevOps shop, you are a NoOps shop.  MD3 doesn't magically make this happen, but it is one tool and set of processes and practices that can help you get there better and faster.  

Deferred Deployments

One ISV that implemented MD3 was very concerned about unnecessary downtime due to schema changes that could be executed shortly after the system was brought back online after a deployment.  An example is index changes.  Some index changes must occur with the application upgrade because the index may introduce a new unique index/key.  Other index changes are simply for performance reasons and can be deployed after the system is live using ONLINE=ON...if possible.  

This was a breeze to implement in MD3.  We added an option to "defer" certain changes that were time-consuming, if the @Defer option was set in MD3.CreateIndex (for example).  The deferred items were run after the deployment using Service Broker.  Our downtime deployment then consisted of what was minimally needed for the application to be brought back up.  Generally this was only table changes and changes to database "methods" (views, functions, stored procs).  This reduced downtime deployments to just a few minutes.  

Later we dropped the downtime to 7 seconds using ALTER SCHEMA...TRANSFER and MD3...but that's a different post.  

Summary

The above stories are just a few use cases where MD3 and the underlying principles of declarative, metadata driven deployments helped my clients overcome some daunting database deployment hurdles with ease.  There is a perception in our industry that relational schemas are too difficult to "evolve" over time, leading the the introduction of schema-less NoSQL solutions.  The fact is, relational schemas are easy to evolve when you have the correct tools and practices to do it.  The above stories are just a few cases where we had difficult problems and we solved them quickly using good processes.

 


You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3: Why you should care about declarative database deployments

Over the last several posts we've covered how MD3 handles database deployments.  You may still be wondering why you should care about deploying your database code using metadata instead of hand-crafted DDL.  Even if you use a GUI-based "database compare" tool to generate and deploy your DDL you may want to consider MD3 instead.  In this post I'll briefly cover why MD3 is superior to these tools.  IMHO.  

Why you should care about Metadata Driven Database Deployments

Quick Links
Introduction to Metadata Driven Database Deployments
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

Here is a short list of problems that MD3 and/or any "declarative" deployment utility fixes.

  1. Are you fearful of refactoring large tables in your application for fear of writing all of those DDL scripts to migrate the data from the old structures to the new structures?  
  2. Have you ever written DDL that ran on your dev and QA databases without incident but failed miserably during your production deployment?  
  3. Do you have trouble remembering esoteric DDL that you may only use a couple of times per year?  Example...can you remember when to use WITH CHECK vs WITH CHECK CHECK when you create your foreign keys?  Isn't it easier to simply remember to select an option to "allow existing data to violate the constraint"?
  4. Do you use "graphical DDL" tools to make your schema changes but worry about the tool recreating expensive objects during your deployment because you can't see the source code or the DDL that it is applying?  Do you wish your "graphical DDL" tool gave you finer grain control over HOW it deployed database changes?  For instance, would you like to be able to build some indexes during downtime and schedule others for later.  
  5. You have customers on multiple releases of your software/database and you fear that you don't have time to test every database migration path.  You've been burned by this before and management now fears every database schema changes, regardless of how minor.  
  6. Your developers want to move at Internet speed.  They hate having to come to the Database Review Board for every database change request.  They want to prototype something quickly and modify the database schema on-the-fly.  The Board doesn't allow this because the DDL to migrate the schema is always buggy and causes botched deployments.  
  7. You've heard your developers at the water cooler bitching about what a gigantic PITA it is to alter a relational schema.  You heard them talking about bringing in a few NoSQL products because they are "schema-less".  Now the developers can alter their schemas without having to deal with huge data migration efforts and buggy, hand-crafted DDL.  (This is complete bunk BTW).  

Using a "declarative" approach to database deployments solves these problems.  And as I've shown over the last several posts, MD3 does all of this for you.  

Why MD3 is better?

Look, I've used quite a few of the database deployment tools on the market.  By and large, they are great.  Ingenious even.  But the problem is that none of these tools solves all of the full data lifecycle issues that I've mentioned above.  None of them.  Some might make schema changes simple with point and click tools, but don't support Continuous Integration Testing.  Some make schema changes easy, but don't even attempt to provide a framework to handle the data migration tasks that are inevitable with any schema change.  

Evolutionary Databases

In January 2003 Martin Fowler, a development industry heavyweight wrote about the need for evolutionary databases.  At that time I already built the first iteration of MD3 that supported both Oracle and SQL Server.  Even this early version of MD3 had all of the features Mr. Fowler advocated.  Or had the underpinnings that would get MD3 to a full-featured database deployment tool in the future.  My biggest complaint with this article is that Fowler pointed out very common-sensical issues with relational database development, but offered no real solutions.  There is a section at the end that mentions "tools to help" but doesn't actually list any tools.  

During my entire career every employer/client has been uncomfortable with database refactoring and "database evolution" because of the points Mr. Fowler makes.  Once exposed to MD3 this mindset slowly changes.  Management, senior developers, and architects realize they now have tooling, processes, and patterns that actually make constantly evolving relational schemas easy.  

The Curse of Relational Databases

Grant Fritchey, a SQL Server industry heavyweight wrote The Curse of Relational Databases in June 2014.  Eleven+ years after Fowler wrote about evolutionary databases Mr Fritchey points out many of the same problems.  Eleven years and not much progress.  But now there are alternatives to relational databases and Fritchey notes the NoSQL movement which touts ways to get around the difficulty of relational schema lock-in.  I've noticed the same things.  He succinctly sums up his article by mentioning that relational folks aren't paying enough attention to the pain that a RDBMS imposes on organizations (my words, not his).  He's right.  But there are simple solutions, like MD3, that can give to data developers the same processes and patterns that are working for regular 'ol developers...continuous integration testing, properly versioning database objects with the application code (name one tool that does this) not as a separate code base or tool, and declarative programming instead of hand-crafted DDL.  

Summary

Sorry for rambling on in this post but I believe the perceived difficulty in evolving relational schemas may be a turning point in the relational industry.  More shops will turn to NoSQL solutions because of the perceived ability to be unconstrained when it comes to schema.  Folks don't want to plan their schemas up-front anymore.  They want to be agile.  They want their schemas to evolve as their businesses evolve.  They don't want unnecessary schema lock-in.  Unfortunately, I believe NoSQL in many of these cases is the WRONG solution to the schema lock-in problem.  Sure, the schema isn't quite as locked-in with NoSQL, but your just trading one set of problems for another.  Ever tried writing a report in Crystal with a NoSQL product?  Can't be done.  So all the time you saved on your DDL you wasted on report development.  There are good use cases for NoSQL, but the "schema-less" argument is very weak.  The fact is, with good data lifecycle tooling we can achieve evolutionary databases quite easily without the relational database being a curse.  

In the next post, nodetitle we'll look at some use cases where MD3 and declarative database deployments can really save you a lot of time when you need to do schema refactoring.  Stay tuned.  


You have just read "MD3: Why you should care about declarative database deployments" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Continuous Integration Testing with MD3

Over the last several posts I've covered MD3, my database deployment utility for SQL Server-based applications.  MD3 handles stateful database object deployments.  In those posts I've constantly mentioned that one of the benefits of a using a declarative database deployment utility is the ability to create proper "CI build loops".  In this post we're going to finally see how to do this.  Even if you don't use MD3 I'll show you the right way to setup your CIT testing (continuous integration testing).  MD3 makes CIT ridiculously easy and includes the necessary scripts to make CIT a reality, quickly.  

What is Continous Integration Testing?

Continuous integration testing (referred to as CIT for the remainder of this post) is the process of checking in (or merging) your code daily and setting up a mechanism to ensure the code actually compiles.  Most shops also run a series of unit tests after the compile to ensure the application functions as designed.  This is one part of a philosophy known as "test driven development".  If there are build breaks then most shops have a "stop the line" mentality where everyone stops their current work to fix whatever is broken in the build.  The theory is that the longer we allow broken builds then the more risk we introduce as features cannot be tested soon after they are developed.  Broken builds beget broken builds.  None of this is controversial and is a standard practice in the software development industry.  

Except when it comes to databases.  Very few shops have proper database CIT in place, if any.  Similarly, very few shops have any database unit testing either.  There are some good reasons for this, the biggest reason is something I've mentioned numerous times in these posts...databases have "state" and it is difficult to account for state when building a database deployment tool, let alone dealing with unit testing and automated deployments.  

Most technology folks simply don't know how to test their database scripts in an automated fashion.   Further, most guys don't even know what to test.  

Before we get into the details of CIT we should first determine just how complicated your CIT needs will be.  We do that by determining how many "releases" of your database/application you must support.  

How many "releases" of your database do you support?  

There are two choices:  

Type Description Examples CIT Complexity Versions You Support Level of Support for Previous Versions
Type1 You support one production database and ultimately deploy all scripts to that database.  
  1. web application back-end databases
  2. in-house applications
Very simple One (the current production) Zero (not needed).  Once you deploy Version Y to prod your support for Version X ends.  
Type2 You support many customers, each with different releases/versions of your database/application
  1. You are an ISV (independent software vendor) with many customers and you don't use multi-tenancy.
  2. Some customers are running Version X, some are on Version Y, some on Version Z.  You need to be able upgrade all of those customers to your latest application release.  Further you must bug fix those older releases in a reliable manner.
Complex More than one

The more versions you support the more complex CIT becomes

You know your database is a "Type1" database if the ultimate target of your database deployments is one, single production database.  Your application is still Type1 even if you have multiple customers but only one multi-tenant database.  In this scenario when you upgrade your database/application then all customers are simultaneously on the new release.  If you support multiple production versions of your database, usually because you are an ISV, then you are a "Type2" database.  We'll see why this matters soon.  

The basic requirements of database deployment testing

Regardless of whether you are a Type1 or Type2 database there are some basic requirements for CIT.  This is why we build CI Loops:

Test Name Requirement Description How to do it
Compile Test Ensure all scripts compile and execute without error We want to catch cases where someone accidentally checks in code like this:  CREATE TALBE dbo.Employees.  This will generate a syntax error and will break the deployment.  
  1. CREATE DATABASE NetNew
  2. Deploy your database scripts to this blank database and ensure no errors are thrown.  Tools like Jenkins will do this and send the output to your email.  
F5 F5 Test Ensure scripts don't fail on subsequent deployments.   In this case the script will run properly the first time, but will fail on subsequent deployments.  We call this the "F5 F5" test because it simulates running the script twice in SSMS (by pressing F5 twice).

Deploy your database scripts again to NetNew.  There should be no schematic or model data differences when comparing them.  

Net New vs Upgrade Test Ensures upgraded databases are identical to net new databases We want to ensure that our upgraded databases have the same schema and model data as our net new databases.  If there are differences then we know we have something wrong with our database scripts.   After deploying to NetNew above we deploy to a database from our previous release(s) and then compare the results.  This could also be our prod db or a customer db.  

On top of this, after each deployment we should rerun our database unit tests, if they exist.  I totally recommend TSQLT for this.    

What are the "differences" we are testing for

In the table above I mention many times that we are checking for "differences" between two databases.  Specifically we are looking for schema and model data differences.  In the MD3Objects\Validators folder we provide you with two procedures that help to do this, MD3.CompareSchemas and MD3.CompareMasterData.  

MD3.CompareSchemas

MD3.CompareSchemas:  is a stored procedure that compares the schema of two databases and notes any differences.  It looks specifically for:

  • missing/different table columns
  • missing objects (triggers, views, constraints)
  • index differences

Any differences that are reported may indicate a problem with your database deployment scripts.  In the screenshot to the right I compare two versions of the AdventureWorks db and report on the differences.  Note in the screenshot that we are missing some stored procedures and triggers, indicating that something is wrong with our deployment.  MD3.CompareSchemas is a great utility for comparing the schemas of any two databases, not just testing that your deployment scripts are working as designed.  

MD3.CompareMasterData

MD3.CompareMasterData compares table data between two databases.  First you need to define which tables should be part of your comparison.  This would include your model data tables (I discussed this in the post nodetitle).  The first check is a simple rowcount check.  It then looks for data differences by checking the bin_checksum against the keys of the table.  Not only will this show you the model data differences between a net new and upgraded database, but you can run it against a previous release of your database to see what model data has changed in this release.  

The graphic to the right shows that the model data rowcounts are wrong between my net new and upgraded db.  My net new db is missing all of its model data for Production.ProductCategory.  In the graphic below we see what happens when the "data" for the non-key columns in Production.ProductCategory differ.  In this case the bin_checksum does not match and that is reported for a developer to check.  

The "hows" of CIT

Hopefully you understand what you should be testing with your database CIT process.  Now we get into how to do it.  I have not included any "CIT scripts" with MD3 because it is heavily based on your requirements...is your db a Type1 (one prod database/one version supported) or Type2 (ISV with many prod databases at different versions) database?  Also, every shop uses a different CI tool.  I prefer Jenkins but TeamCity is also excellent.  You could just write a simple PoSh script that kicks off the necessary tests and sends out notifications of build breaks.  Let's just cover the basics of how you do CIT with your database.  

For every active code branch of your source code you should have the following tests:  

Test Name Purpose Process Test Assertions
Compile Test Ensure all scripts compile and execute without error
  1. CREATE DATABASE NetNew
  2. Deploy your database scripts from that branch to this database.
  1. RunMD3.ps1 completes without error
F5 F5 Test Ensure scripts don't fail on subsequent deployments.  
  1. Copy NetNew to NetNewCopy.  
  2. Deploy scripts to NetNewCopy.  There should be no schematic or model data differences when comparing them.  
  1. RunMD3.ps1 completes without error
  2. MD3.CompareMasterData shows no differences
  3. MD3.CompareSchemas shows no differences
Upgrade Test Ensures upgraded databases are identical to net new databases
  1. If you are a Type1 database then deploy scripts to a production copy of your database.  
  2. If you are a Type2 database then deploy scripts to one or more copies of customer databases for every release you support.  More on this next.  
  1. RunMD3.ps1 completes without error
  2. MD3.CompareMasterData shows no differences
  3. MD3.CompareSchemas shows no differences

In the Upgrade Test I said that you will run this test on at least one copy of a production database for every release you support.  This is only necessary if you support a Type2 database (you are an ISV).  Let's assume you are currently developing and testing Version 4 of your application and you have customers on Version 1, 2, and 3.  In this case you want to obtain representative databases for each of those releases for CIT testing.  Copies of customer databases work the best, but older non-prod copies work as well.  The goal is to ensure that our database scripts are written to ensure that the oldest databases we support can be upgraded without error.  

This becomes a lot of CIT processes that need to be setup.  In the case where you are developing for Version 4 we actually need to perform 5 separate database deployments:

  1. Compile Test
  2. F5 F5 Test
  3. Version 1 Upgrade Test
  4. Version 2 Upgrade Test
  5. Version 3 Upgrade Test

And that is only for your Version 4 code branch.  If you have other branches that you maintain then you should have tests for each of those branches as well.  

Now let's assume that with Version 6 you cease support for Versions 1 and 2.  In that case your CIT process looks like this:

  1. Compile Test
  2. F5 F5 Test
  3. Version 3 Upgrade Test
  4. Version 4 Upgrade Test
  5. Version 5 Upgrade Test

Your CIT process will be the most valuable when your Upgrade Tests are using representative databases.  If you are just going to run CIT against old dev databases you probably will miss a few issues.  You'll get better results using a perf or QA environment from those older releases.  But you'll get the best results if you use actual customer databases to test your upgrades.  I've worked for ISVs with severe data privacy rules and we were still able to get "scrubbed" copies of their prod databases.  If you can get actual databases then you are almost guaranteed that a prod database deployment will never fail since you are running CIT often...maybe even after every code checkin.  Further, you can measure the approximate duration of the deployment which will show you how much downtime is required.  

When to run the CIT Build Loops

You should really use a tool like Jenkins or TeamCity that monitors your source code repository and starts another round of CIT whenever a new checkin occurs.  However, that's potentially a lot of tests that need to be run multiple times per day.  In that case maybe you only want to run the Compile and F5 F5 tests on every checkin, then run the full suite of tests once per day.  It's entirely your call.  The more you run the CIT loops the sooner you will spot problems.  

Summary

In this post I covered how I like to do database CIT.  Your needs may be different depending on how many releases of your prod database(s) you support.  But the concepts are similar.  You want to ensure your code runs correctly, then reruns subsequent times without side-effects.  Those side-effects are errors, schema differences, and differences in model data.  The better your CIT process the more reliable your final production deployments will become.  If you work in a shop where a prod deployment causes you sleepless nights then implementing a good CIT will alleviate that.  

In nodetitle I'll cover how I used this CIT process to get our deployments to under 7 minutes for a huge OLTP system.  I'll also show how using CIT with MD3 saved us thousands of man-hours.  


You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Guacamole 0.9.4

Posts about setting up guacamole (a client-less RDP Gateway) still tend to be the most read posts on my site.  guac let's you rdp to your desktop using a web-browser (or your mobile device).  Very handy.  You can also ssh and vnc as well.  In this post I'll cover how I upgraded guac 0.9.2 to guac 0.9.4 which was released earlier this month.  

My original post on setting up guac

My post on upgrading to 0.9.2

We are going to follow the steps in my original post on setting up guac.  Please refer to that as needed.  

  1. Backups are always prudent.  
  2. Do Step 4 and 5 in my original blog post but change the file name to 0.9.4.  
  3. Replace the old war file with the new version you just downloaded:  sudo mv guacamole-0.9.4.war /var/lib/tomcat7/webapps/guacamole.war
  4. Do Step 6-12.  
  5. Restart Ubuntu.  
  6. Close your browser and reopen it (otherwise the javascript from the old version will be cached and guac will act strangely).  
  7. Connect to your previous guac URL and log in.  

Seems to be a stable release.  Awesome piece of software...can't say enough good things.  


You have just read Guacamole 0.9.4 on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

How to Reverse-Engineer Your Database with MD3

Metadata Driven Database Deployments (MD3) is my build-and-deploy system for SQL Server databases.  In the last couple of posts we covered what problems MD3 solves and how it solves them.  In this post in my MD3 blog series I'm going to cover how to reverse-engineer your database using MD3's native capabilities.  

One of the biggest hurdles when migrating to a new build/deploy system, VCS, or data development tool is migrating all of your existing data and code into the new system.  This is actually very simple to do in MD3.  But first remember that MD3 is modular...if you don't want to use the MD3.Create* procedures then you don't need to run the migration scripts for those objects.  You only need to reverse-engineer those objects that you want to use with MD3.  The full reverse-engineering documentation is 32 lines of instructions...that's how easy MD3 is to get up and running.  Here's the process:  

  1. Find a pristine copy of your database to reverse-engineer.  If you work in a shop with one production database then this is simple...script that database.  But if you are an ISV (which MD3 is geared towards) then finding a pristine copy of a database to reverse-engineer can be difficult.  Especially if your database was not properly versioned in the past.  I can help you with this process if you contact me.  
  2. Deploy the MD3 objects into a copy of that database.  Basically just run MD3.ps1 with a clean copy of MD3.  This will deploy everything you need and will not introduce any risk into your existing database.  
  3. Run the MD3.Repopulate* procedures for each object type you want to migrate into MD3.  
  4. Run the MD3.Script* procedures which actually generate the .sql files with the MD3 calls using simple PRINT statements.  
  5. Check your code into source control and begin working on your CI Loops.  
  6. If something isn't working correctly you can simply re-script those items as many times as you need to, or you can manually update the metadata yourself.  

It's that simple.  

Let's look at what the Repopulate and Script procedures are doing under-the-covers.  

Deep Dive - MD3.Repopulate Procedures 

As mentioned in other posts in this series, stateful object properties are stored in tables that look very much like the underlying catalog views where SQL Server maintains its own running metadata (for instance, sys.indexes).  The MD3.Repopulate procedures rebuild MD3's version of these metadata tables.  Here's the code that handles index repopulation:

The bulk of the code starts on Line 25 and merely repopulates MD3.Indexes with metadata from your database.  Notice how succinct the code is...database deployments are simple when you think through the actual requirements and understand The Four Rules.  

Line 21 might be confusing...we blow away what was last scripted...except cases where DropIndex was set.  Why do we do maintain DropIndex data? Remember that indexes are stateful data and we must maintain their history.  MD3 will script our current database, but it knows nothing about the history that existed in prior versions of your database.  So, for instance, perhaps you had some old indexes in previous releases that you determined served no real purpose (ie, all activity was writes with no reads) and you dropped them.  Now you script your current database into MD3 but you want to use MD3 to drop those old indexes from older customer databases.  By manually adding the DropIndex metadata (just the index name) you can be sure that any databases that still have those indexes will have them properly dropped.  

Essentially you are telling MD3 what the history of your database really was.  No database reverse-engineering tool on the market can handle recreating history for dropped objects, and neither can MD3.  You must supply this history, if you need it.  

By using this method you can re-script your database numerous times while you test your MD3 migration with CI Build Loops without interrupting your current developers who are still actively using your previous deployment method.  

All of the MD3.Repopulate procedures work similarly to MD3.RepopulateIndexes.  Here is an example demonstrating how CHECK CONSTRAINT repopulation works:  

 

Deep Dive - MD3.Script Procedures 

Once you have the metadata scripted to the MD3 metadata tables you need a method to get that metadata into .sql files so you can properly place it under source control.  To do this we have stored procedures that use PRINT statements to "generate SQL from SQL".  You simply save the output to your indexes folder.  The output is a MD3-formatted script that follows the ModelData Pattern discussed earlier.  The output will look something like this (this is from Adventureworks):  

The screenshot shows the last index in AdventureWorks that was scripted (Line 110).  Note that after we populate MD3.Indexes with simple INSERT statements during a deployment we immediately apply the indexes on Line 113 by calling MD3.RunIndexes, which does all of the heavy lifting.  Very simple. 

Putting It All Together...Reverse-Engineering AdventureWorks

I have included a copy of the reverse-engineered AdventureWorks2012 SQL scripts in MD3 format with the source code (click here for the code).  The scripts include all necessary code to reproduce a "net new" (shell database without the data) copy of AdventureWorks2012.  And this only takes about 10 minutes to reverse-engineer on your own.  


You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Deployments in Action

Over the last few posts we've covered how MD3 handles stateful database object deployments.  We've covered all of the MD3.Create procedures and how they help you avoid writing error-prone, hand-crafted DDL.  We then looked at RunMD3.ps1 which is the MD3 executor script.  In this post we are going to look at various outputs you can expect from RunMD3.ps1.  

Why should we build and deploy EVERY database script EVERY time, even if the object hasn't changed?  

In MD3 Script Idempotency and MD3.CreateColumn we looked at why all sql scripts in MD3 must be "rerunnable", meaning that a script cannot have "side-effects" if it is subsequently executed multiple times.  With MD3 every SQL script is run during every deployment.  Folks that are new to MD3 question this approach.  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

Just suspend disbelief and you'll see the benefits of doing this.  

  • It’s the safest thing to do…you always deploy what you tested and what you expect.  You never know who altered a stored proc on a database without going through the proper build/deploy mechanism.  Simply redeploying all objects ensures that your database is exactly how you want it.  Developers and QA folks can manipulate their environments and can later "reset" them by simply getting the latest code and deploying it.  
  • It only takes a few milliseconds to check metadata “properties” for an object and determine there is no net change. Even with thousands of database objects that haven't changed in years you will only spend at most a few minutes checking the metadata.  Trust me.  I've run MD3 scripts against HUGE databases where no objects have changed since the last deployment...this still only takes a few minutes.  
  • We are not locked into build-by-build deployments where we run only the scripts that we think have changed. We can upgrade any database to “current” without having to worry about the starting point.  If you have customers who are behind by several releases you can rest assured that they will upgrade and have identical schemas to your newest customer.  
  • Frankly, this is how every other language does build/deploy.  This is an industry standard best practice that encourages continuous integration.  SQL should be no different.  MD3 enables this.  

Running RunMD3.ps1

There are lots of ways to run PoSh scripts.  I like to open a command prompt and start powershell.  You then cd to the root directory where your MD3 deployment scripts reside.  Then you simply call the deployment executor using PoSh dot notation, like this:  .\RunMD3.ps1.  There are two parameters -ServerName and -DatabaseName.  

Prerequisites

  • You can run the PoSh script from any machine with the sql tools installed.  There are no special powershell requirements.  
  • If this is the first time you are running a PoSh script on the given machine then you must run set-ExecutionPolicy Unrestricted first.  This allows you to run PoSh scripts.  This isn't my requirement, rather a requirement of powershell itself.  
  • The specified database must exist.  This is the default behavior.  You can of course issue the command to create your database first, if it doesn't exist, by placing the relevant SQL in 00Pre-install\00CreateDatabase.sql.  In fact, this is a good idea if you are going to create CI Build loops (see nodetitle).  But by default, for safety reasons, MD3 requires the db to exist.  If it doesn't you'll see an error similar to this:  

Errors

Whenever an error is thrown with a Severity >= 16 the deployment will stop and an error message will be displayed.  This works EXACTLY like SSMS.  

In this screenshot we can see that the failure occured in 3Tables\HumanResources.Employee.sql because the dbo.Flag user-defined data type does not exist.  By using start-transcript, or by piping RunMD3.ps1 output, you can capture the execution log seen above to a text file as well.  Execution logs are also saved to a table named MD3.Log.  This table is truncated at the start of every deployment, so if you wish to archive the data from every deployment you must either alter the MD3 system or squirrel the table away to another location.  

The following graphic shows the contents of MD3.Log after an error was thrown:  

In this case we can see the error that was thrown.  The constraint could not be created.  The actual SQL statement that errored, that was generated by the MD3 system, based on the properties passed to MD3.CreatePrimaryKey (in the example), is always logged just before the error.  This will help you determine what went wrong.  These errors are extremely rare and should always be caught during continuous integration testing.   

Here is what you can expect to see when MD3 succeeds:  

Warnings

Any WARNINGs will be displayed just before the success message, or can be queried from MD3.Log by looking at WHERE Severity = 'W'.  Out-of-the-box the only WARNINGs MD3 will throw are untrusted constraint violations (if you allow them).  You can code your own by tactically adding calls to MD3.InsertLogMessage with the @Severity = 'W' option.  

Summary

In this post we showed how to run the MD3 executor PoSh script, RunMD3.ps1.  This utility executes any .sql file it finds in its folder or any subfolder.  If a .sql script throws any errors those are bubbled up and logged to MD3.Log and are shown to the user running the PoSh script.  In the next post we'll look at nodetitle.  


You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

RunMD3.ps1: The MD3 Executor

Metadata Driven Database Deployments (MD3) is my build-and-deploy system for SQL Server databases.  In the last couple of posts of this blog series I covered what MD3 and why you should care about metadata-driven, declarative database deployments.  In this post I'm going to cover how MD3, by default, executes its scripts.

RunMD3.ps1

First, there is no requirement to use the script executor, called RunMD3.ps1.  You can always use your own sql script deployment tool, such as Visual Studio db projects.  Or you can write your own.  I've written script deployment tools using batch files, osql, and vbscript.  Do whatever works for you.  For the remainder of this post I'll assume you are using RunMD3.ps1. Before we cover what this script does let's cover how we should deploy sql objects.  

Database Deployment Best Practices

Here are some best practices for any database deployment tool:  

  1. deploy ALL objects with EVERY deployment.  This aids in CI Loops (nodetitle) where we should compare upgraded databases to "net new" (shell) databases.  
  2. Need to order the object deployments such that dependency errors are not thrown.  For instance, we should apply tables before their indexes.  
  3. A "status" should be available that tells the release engineer/DBA exactly what is currently being deployed.  This helps if a given script, for some reason, decides to execute for longer than it should.  
  4. Errors should be easy to recognize and should be logged.  The errors should not be caught by the tool and obfuscated into the tool's error message.  The actual SQL error message should be returned.  

How MD3 handles these requirements

You can of course customize this process to suit YOUR needs.  

  • RunMD3.ps1 only executes .sql files.  This means you can intersperse unit test files (such as tsqlt scripts) or documentation files with your .sql files.  Simply give them a different file extension.  
  • Files and folders are run alphabetically.  This means that you can simply look at your sql script folder structure and determine the order that scripts will execute.  See the folder structure I use in the nearby graphic.  Note that we deploy the MD3 system first (1MD3Objects) before we build any "user" objects like tables (3Tables).
    • Sometimes (very rarely) you'll get dependency errors and this execution method won't work.  For instance, you may have a view (4Views folder) called Bar (deployed in 4Views\Bar.sql) that calls another view called Foo (in 4Views\Foo.sql).  This will fail during your CI testing.  No problem...two possible solutions:  
      • rename Bar.sql to 00Bar.sql and create a new Bar.sql with a comment that states the file has moved due to an ordering dependency.  
      • use subfolders
  • Within a given folder, the .sql scripts are run PRIOR to any subfolders.  For instance, in the 1MD3Objects folder you'll see the objects and subfolders in the nearby graphic.  
    • In this case CreateMD3Schema.sql will run BEFORE any .sql files that are in the Functions folder.  All MD3 objects are deployed to the MD3 schema, hence why CreateMD3Schema.sql is run first.  Otherwise, you'll get a dependency error (mentioned above).  
    • This convention allows you to structure your folders into "software modules" folders.  For instance, you may have thousands of stored procedures under 7StoredProcedures.  You may want to organize those into subfolders like Accounting, HR, ETL, etc.  
  • You can use whatever folder structure you like, and nest as deep as you like.  
  • If you wish to skip a folder from execution, preface the folder with "_".  For instance, notice the _Documentation in the screenshot above.  That folder contains example .sql files.  Those will NOT be executed.  
  • If you wish to skip a file from execution either "zero out" (blank) the file or rename it to something other than .sql.
  • Any valid TSQL will be executed. You don't need any special formatting or syntax.  

If you use MD3's suggested folder hierarchy then ordering dependencies are minimized.  

"Special" Folders and Files

These items break with the "convention over configuration" paradigm that I've designed with MD3 (discussed in nodetitle).  

  • PostApply folder:  anything placed in this folder will run at the end of an MD3 deployment.  This is useful for backup scripts, notification scripts, scripts to restart replication, SQLAgent jobs, etc.  
  • .\00Pre-install\00CreateDatabase.sql ALWAYS runs FIRST.  This script will CREATE DATABASE if needed.  This is useful for CI Build Loops (discussed in the post Continuous Integration Testing with MD3).  You can also put backups scripts here or anything else that needs to run at the start of a deployment.  

Errors, Logging, and Rerunnability

Since every script is (re-)run with each deployment you have to write your code to ensure it is idempotent/rerunnable.   (MD3 Script Idempotency).  This is easy to test with proper MD3 Continuous Integration Tests.

When an ERROR occurs (Severity >= 11) in one of your scripts then MD3 aborts and displays and logs an error. Simply fix the error and redeploy.  No need to restore the database first due to a failed deployment.  

Back to RunMD3.ps1

Now that I've covered what RunMD3.ps1 does and why, let's cover how it does it.  At its core it is a little Powershell script that recursively calls all folders and sql scripts starting in the root folder from where it is called.  It is not written in something like Python or Perl which many database professionals may not understand. Other than this little script, MD3 is written entirely in TSQL.  (I have an Oracle version as well, also entirely written in PL/SQL.  Contact me for details).  

It requires 2 parameters, the $ServerName and $DatabaseName of the database to be installed/upgraded.   The source code is available here.  

It uses NT authentication.  It can be run remotely, but the machine executing RunMD3.ps1 must have the sql client installed.  The entire script is about 200 lines of Powershell.  The bulk of the code is:  

Line 181 runs  .\00Pre-install\00CreateDatabase.sql.  This is discussed above.  If you don't need this functionality simply remove this line of code.  However, this code really helps with CI Testing by building your shell database.  Line 186 calls each .sql script in every subfolder starting with the folder where RunMD3.ps1 lives.  

That's it.  

In the next post we'll take a look at MD3 Deployments in Action.  


You have just read "MD3.ps1: The MD3 Executor" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Model Data Patterns

Metadata Driven Database Deployments (MD3) is my build-and-deploy system for SQL Server database applications.  In the last couple of posts we covered what problems MD3 solves and how it solves them, specifically involving stateful database objects.  In this post I'm going to cover how MD3 handles deploying "model data".  I'm not aware of ANY database deployment tool by ANY vendor that has the ability to deploy model data in a "stateful" way.  In fact, I have a few MD3 users who started their journey to properties-based, metadata-driven evolutionary databases by first implementing MD3's model data deployers.  

What is Model Data?

You may be familiar with model data by a different name.  I've had clients that have used these terms as well:  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)
  • lookup data
  • allowable values
  • system data
  • seed data
  • master data

I'll use the term "model data" for the remainder of this post, but the terms are interchangeable. A quick example of model data is any data that YOU deliver with your database.  For instance, you may have a table named dbo.StateProvince that holds a row for each state or province in the United States.  This data rarely, if ever, changs.  

Model data may:  

  • never change
  • rarely change
  • only be modifiable by the customer
  • only be modifiable by YOU (the vendor)
  • be modified by either YOU or the customer.  

Classes of Model Data

Not all model data is created equal.  There are classes of model data based on who may modify the data.  I've invented these terms, they aren't industry standards, but I think by describing model data in terms of its "type" it is easier to understand its requirements for deployment.  

Type Model Data Type Description Example Vendor-modifiable Customer-modifiable MD3 implementation
Type1 Vendor-supplied model data model data may not be changed by the customer.  It's likely this data rarely changes, but if it does it is the vendor's responsibility to migrate the database "state".   dbo.StateProvince data is not allowed to be changed by the customer. x   fully supported
Type2 customer-only modifiable model data this model data is "seeded" by the vendor but after that is modifiable only by the customer dbo.StateProvince data initially delivered to the customer, but after that the customer maintains the data   x fully supported
Type3 fully-modifiable model data data is "seeded" by the vendor and can be modified later by either party dbo.CustomerType is seeded with "Prospect", "Active", "Archived" by the vendor.  Customer may add or change these, and based on the business requirements the customer changes are either honored or discarded by later vendor data changes x x MD3 provides simple templates but the implementation is based on YOUR business requirements

Often your model data is not as black-and-white as the types I've listed above.  It is common for some metadata to be a combination of types 2 and 3.  That's OK, MD3 can still handle it.  

I've implemented MD3 model data patterns for several different ISVs (Independent Software Vendors).  In every case determining the model data type for each piece of model data was the most difficult part of the migration.  Software vendors just don't understand their data I guess.   Don't be surprised when model data becomes the Achilles' Heal of your MD3 migration.  But don't let that scare you.  In the process of getting your model data correct you'll learn A LOT about your database and application.  

Scripting your Model Data

Let's look at how MD3 handles model data.  We'll use Sales.Currency as our example model data.  Out-of-the-box this table contains 105 currency rows.  

Let's start by scripting this model data out using MD3's scripting tools.  We'll cover how to reverse-engineer your entire database in the post nodetitle, but for now let's start our look at model data by reverse-engineering Sales.Currency.  

Scripting model data is as simple as running a stored procedure, MD3.ScriptModelData  (code).  Here is the list of all scripting options.  

This procedure simply generates output that can be saved to a .sql file that will contain all of the commands necessary to generate a model data script for the given table that you can then modify as needed for the various "Types" of model data (discussed above).  There are lots of options you can call this procedure with that will alter the output to fit your needs.  You can regenerate your model data as many times as you like until you get it in the correct format to meet your requirements.  Here is the basic command to script Sales.Currency.  Before you execute the script press Ctl + T to ensure your output is in text mode.  

Simply save the output wherever you want to store your model data, preferably under version control.  If you follow the out-of-the-box MD3 patterns then this would be saved to 8ModelData\Sales.Currency_modeldata.sql.  

The Model Data Pattern

Now that the data is scripted, let's look at the model data pattern in detail. The graphic above shows the start of every model data script.  The first thing we do is temporarily disable all FK constraints on the tables associated with this model data table.  We do this so we can load and manipulate the model data without having to execute the scripts in dependency order.  Next we build a temp table that is schematically identical with the underlying table.  The WHERE 1 = 0 clause quickly gives us the structure without inserting any data.  Next we populate the temp table using the data in the table as it existed when it was scripted.  Every row will be scripted.  

The graphic below shows the code after we build the temp table with the scripted model data.  Note that there is an "insert section", an "update section", and a "delete section".  These sections simply compare what is in the temp table vs what is in the actual table.  If a row is found in the temp table that is not in the actual table, by key, then that row is a new row and is inserted.  If a row is in the actual table, by key, and is not in the temp table, then that row is removed from the actual table.  

The "update section" does a binary_checksum against non-key columns then compares that using the key and updates the actual table wherever the binary_checksum differs.  

Finally we drop the table and print a message that we should re-enable all FK constraints.  This is a message that is helpful to the developer.  If using MD3's executor then the FKs are automatically re-enabled and any error messages (DRI errors) will be thrown as normal.  

This is the basic "Type1" model data pattern that MD3 provides out-of-the-box.  

Model Data Use Cases

Assume you have a requirement to rename ZWD's description from "Zimbabwe Dollar" to "Zimbabwe DollarBill".  All you need to do is change the model data entry as shown below.  

You can see the output.  No rows inserted, none deleted, and one row updated.  

Suppose the requirement is to add a new currency called ZZZ FooBar Dollar.  Here is the change:  

Note that when executed we have one new row inserted this time, with no updates and no deletes.  Just what we expected.  

Finally, if you wanted to delete the metadata for ZAR Rand you could either remove that row or simply comment it so you had a "history" of that data element for future reference:  

Type2 and Type3 Model Data

These examples assumed you only needed to handle Type1 (vendor-only modifiable) model data.  MD3.ScriptModelData only scripts Type1 model data. The reason is simple...Type2 and Type3 model data are highly customized and are specific to your given use case.  You need to handle these cases by altering the output from the reverse-generated script.    How can we handle Type2 and Type3 model data?

Type2 model data is quite simple.  In this case the vendor initially supplies "seed data" that is never subsequently modified by the vendor.  We simply take the scripted model data and remove the "update" section.  

Type3 model data is totally dependent on your exact requirements.  I can give you ideas on what needs to change in the reverse-engineered data, but you can probably figure out what you need to do yourself if you fully understand your model data.  I can't stress that enough...you need to understand your data.  After that you simply modify the model data pattern to fit your needs.  

Summary

In this post we covered model data, or data that is delivered with your database application.  I (and MD3) view model data as just an extension of your database schema.  Model data needs to be "versioned" just like your schema.  It is a "stateful" database object, just like a table is.  When we modify model data we need to take care to ensure we've handled all of the historical data that may reference that model data.  The model data patterns in MD3 can be used without implementing any other MD3 features.  Implementing model data in a repeatable manner is by far the most difficult aspect of deploying MD3.  This is because most developers truly do not understand the model data in their applications.  Can the data be altered later?  Can it be altered by a customer?  How do we handle model data that was altered by the customer?  These are difficult questions when you don't understand your data.  Thankfully, once you understand your data it is very easy to script it into the MD3 model data pattern.  

 


You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

The OO Design Principles of MD3

Over the last few posts we've covered how MD3 handles stateful database object deployments.  You may still be wondering why you should care about deploying your database code using metadata instead of hand-crafted DDL.  In this post I'm going to take a detour from the technical aspects of MD3 and instead cover the design principles behind MD3 that are not found in (m)any other database deployment tools.  Even if you decide MD3 isn't for you you may learn a few things that help you become a better data developer.  

I designed MD3 to follow standard Object-oriented design principles...using TSQL.  I haven't always succeeded, but the OO principles in MD3 make it so much more robust and reliable than standard DDL.  Let's cover a few of these principles, starting with the most important first (IMHO).  

"Tell, Don't Ask"  

...is a object-oriented programming principle that helps programmers remember that they should strive to tell their functions what to do rather than ask them what their state is, then ask them to perform some action based on that.  We need more of this in SQL and DDL.  Procedural code, such as DDL, does far too much "asking" for information and then making decisions based on the response.  It would be nice if us database developers could just tell our RDBMS what we wanted and the RDBMS would just "do the needful" and make it so.  

Here's an example of some code that probably looks familiar to you...it builds an index if it doesn't exist:  

This is an example of "Ask, then Tell" which is terrible.  As data developers we write code like this because we need to ensure the index doesn't already exist before we create it.  This is a code pattern that you have to remember and may only use a few times a year.  Variants of this code will be scattered in tons of index creation scripts.  All we really should need to do is "tell" our RDBMS what the index should look like and the RDBMS should be able to figure out how to get us that index in the most efficient manner possible.  I call this "declarative DDL".  This is what MD3 does for you.  This graphic shows how MD3 handles index creation declaratively. Very simple...we tell MD3 what our index should look like and it ensures that after EVERY deployment we get an index called nciFoo with the declared properties.  It doesn't matter if this is a "net new" database, a database updated from a 5 year old version of our application, or a database upgraded from yesterday's build.     

DRY Principle

Index creation DDL with an "if exists check" also violates the DRY Principle (Don't Repeat Yourself).  As mentioned before, you may have variants of that code in lots of sql scripts.  Let's suppose you need to modify all of those scripts because you decide to implement ROW compression?  You may need to change HUNDREDS of scripts.  If you use MD3 you simply modify MD3.CreateIndex to default all indexes to use ROW compression.  Very easy!  Change one script instead of hundreds.  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So why do the ANSI SQL standards not specify a Declarative DDL syntax like this?  Probably because the ANSI standards don't specify concepts like "indexes" and "clustering keys" and "fillfactors".  DDL is, in fact, declarative in that you "declare" things like column names and data types, but the process to implement those changes is still a series of imperative verbs that carry out those actions (ie, CREATE and ALTER).  The ANSI specifications allow the "hows" to be vendor-implementation-specific.  And, of course, SQL and DDL were created before object-oriented programming became de facto.  

So why do none of the major (R)DBMS vendors provide native declarative DDL like MD3?  I have no idea.  They really should.  If you look at tools like Microsoft's Visual Studio database projects you can see that there is an attempt to make things more "tell, don't ask", but it is half-baked at best.  In VS database projects you declare the index using standard CREATE INDEX DDL, like this, which I pulled directly from one of my VS database projects:  

This is certainly much easier than having to write an IF EXISTS statement, but it could be better.  The way VS database projects work is that if you need to ALTER an index you simply change the CREATE INDEX command in the index script.  So, to add an INCLUDE clause to this index is simply a matter of adding that clause to the above command.  When the deployment runs it will run the necessary commands to get you what you requested, whether that is a DROP followed by CREATE, a CREATE WITH DROP_EXISTING, or whatever.  But again, if you need to make bulk changes to all indexes, such as adding a FILLFACTOR or COMPRESSION you can't.  That's why I feel it is half-baked.  

Hopefully one day we'll have full declarative DDL in our RDBMSs.  Even the big NoSQL implementations don't handle DDL any better.   If you want to manipulate an index in Hadoop/Hive you still need to use CREATE/ALTER/DROP syntax.  No improvement there.  Mongo is a bit better.  You don't use DDL at all, instead you call a .ensureIndex() method that will ensure your index "looks" a specific way before it allows your job to run.  There are problems with this from an operational standpoint...such as the fact that this may cause your entire collection to serialize while the index is built...but syntactically this is a great improvement over standard SQL DDL and is getting very close to full declarative DDL.  

The Single Responsibility Principle

The SRP is a restatement of Unix's philosophy of Do One Thing and Do It Well.  I've tried to make MD3 adhere to the SRP as much as possible.  As I've shown already, unlike other deployment tools that make you use all of nothing of their solution, MD3 does not.  I've had customers write their own deployer (covered soon in a post about RunMD3.ps1) to avoid using/learning Powershell.  I've had others using VS database projects to deploy but use the MD3.Create procedures to manage their stateful objects. And every client loves the nodetitle scripts.    

Most deployment tools, such as RunMD3.ps1 and VS database projects, are single-threaded...meaning they are designed to run only one script at a time, serially.  I had a client that was EXTREMELY sensitive to downtime and wanted to multi-thread their deployments, yet not have to deal with object dependency issues.  Since MD3 is architected as independent services we were able to make the deployment multi-threaded very quickly simply by running the single-threaded RunMD3.ps1 concurrently with itself.  We made the deployment code "multi-threaded" in two days and cut their deployments times by a factor of 4.  

YAGNI

"You ain't gonna need it" is a principle of eXtreme programming that states that you should never add a feature unless it is needed.  I totally agree with this.  For instance, I only added the ability to use index partitioning with MD3 in 2012 after I finally found a use case where partitioning would be the best solution to a problem.  Prior to that I always solved my problems using other solutions that were at least equal to the task.  

I've had co-workers and clients complain that MD3.CreateIndex does not support critical features they think they need.  An oft-cited example is filtered indexes.  My retort is quite simple...I've yet to see a use case where filtered indexes are the best solution.  I'm not saying filtered indexes are useless, I'm merely saying I, professionally, have yet to need them.  Perhaps I'm just lucky and have never worked on a system that truly needed filtered indexes.  Rest assured that the first time I need a filtered index I'll refactor MD3.CreateIndex so it properly supports this feature.   So, if there are features you need either implement them yourself or contact me.  

Separate Interface from Implementation

Some folks call this ISP (Interface Separation Principle).  I prefer to call it "separating the interface from the implementation" because it seems to make more sense to people.  The "Interface" is how you interact with your object, the "implementation" is what your object does under-the-covers.  You should always try to isolate your implementation details from your users.  As we've seen time and again, standard DDL mixing the "interface" (what you want to do) with the "implementation" (how you do it).  When you use the MD3 procedures you are not exposed to the implementation details.  You don't need to worry about what to do if you need to CREATE your object vs ALTER it, nor do you need to worry about dependent objects.  Everything is handled for you.  Yet since MD3 is implemented in TSQL you have the ability to see how the implementation is handled by MD3 and of course you can change that if it doesn't suit your requirements.  The implementation details are hidden, until you need them.  

Convention over Configuration

This is sometimes known as "coding by convention".  CoC strives to keep the number of configuration options a developer/admin has to worry about to a minimum and supply sensible defaults instead.  Certainly having lots of configuration options is cool, but have you ever had a situation where you've changed a bunch of configurations only to find out that your software doesn't work and can't figure out which buttons and knobs you changed that are affecting the behavior?  

MD3 has NO configuration file.  None.  Configuration isn't needed.  When you look at your folder hierarchy you should be able to tell the order that scripts will deploy in based on their naming convention.  All folders and scripts execute alphanumerically.  This is covered further in my post on the MD3 executor, coming up soon.  

I've had clients add various knobs and settings to customize MD3 for their specific needs and what they thought was a better implementation.  That's fine.  But the goal of good software is to not have to worry about configuring the software to have it do what you need.  

So how do you configure MD3 without an XML, ini, or settings table?  You use the MD3 patterns and conventions I've already talked about in these posts and will talk about in future posts.  We don't need a configuration option that allows you to execute .sql and .sqlx files (whatever that might be).  We use the most common option that everyone is familiar with...the convention of using .sql file extensions to indicate this is a SQL file.  Of course you can change this if you need to, that's totally up to you.  

Write Shy Code

This is sometimes called the Law of Demeter.  I hate that name because it is an unnecessarily wordy, confusing, and obtuse way to explain a simple concept.  Do you even know who Demeter was or what she is famous for?  Probably not.  The LoD can be summed up by stating that your objects should be "loosely coupled".  In other words, modules should not reveal anything unnecessary to other modules that don't rely on other modules' implementations.  An example...if you need to change an object's state, get the object to do it for you.  This way your code remains isolated from the other code's implementation and increases the chances that you'll remain orthogonal.  Orthogonal is a fancy way of saying your code will "stay at right angles to other code"...which is another wordy way of saying you won't introduce "side-effects" if you change something.  

We accomplish this in MD3 quite simply...all code has an interface in the form of MD3.Create procedures.  These procedures take a given set of "properties" as arguments and nothing else.  All state changes are hidden behind those interfaces.  If those interfaces change, such as if we decide to add a new "property" to an object (perhaps adding @FilterClause to MD3.CreateIndex) we can rest assured that we do not have to worry about that code having unintended side-effects that may break other modules.  And we can always test this with our CI build loops (nodetitle).  

Summary
This post was a divergence from my regular posts on the technical details of the MD3 system.  Understanding the design principles of MD3 will help you quickly and safely make your system extensible later, if needed.  I'll cover examples of this in a future post (nodetitle).  The goal of MD3 is to think declaratively (specifying what is to be done, not how) and create a highly dynamic and adaptable deployment tool.  
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Pages

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer