DaveWentzel.com All Things Data
The Wooden Badger
"The Wooden Badger" ... this is the story of how I got that nickname. But first, let me say that I really don't "get" Monty Python. I rarely laugh at their sketches. But other people do. Especially programmers for some reason. The same thing happens with This is Spinal Tap. Programmers love it.
Here's an experiment:
- Ask a random person if he enjoyed This is Spinal Tap
- If the response is any of the following:
- "But these go to Eleven" followed by excessive belly laughter
- "It's like, how much more black could this be? And the answer is none. None ... more black."
- "Remember when Harry Shearer got stuck in his pod".
- Then you've got yourself a programmer.
I don't frankly find Spinal Tap all that funny, but programmers do, so best to speak on their terms. Basically, when in Rome speak Roman.
But I digress.
There is one scene from Monty Python and the Holy Grail that is hilarious. King Arthur and his Kinghts of the Roundtable lay siege on a French castle. They build a Trojan Horse, except their version is a Trojan rabbit. At the end of the scene we see the English watching at a distance while the French wheel the wooden rabbit into the castle. King Arthur asks how this plan is supposed to work. "Now we wait until nightfall and we all jump out of the rabbit, taking the French by surprise." They all simultaneously realize the flaw in their plan and slap their foreheads. Gallahad responds, "Um, well look, if we built this large, wooden badger." Simultaneous groans could be heard.
So, why do programmers in particular find this so funny? My theory is that we feel we can do the same thing over and over again expecting different results. In my experience, that doesn't happen much. For instance, we all know that not having proper declarative referential integrity is asking for trouble. Yet every "new development" project I work on I invariably hear someone say, "we don't need foreign keys, we'll actually be able to ensure integrity at the Java tier this time." And somehow the data folks always tend to lose the argument and the initial version of the system is delivered without keys.
There is something even worse than "repeated insanity" and that is the Second System Syndrome. In this case, we take an elegant, simple system, and we refactor it and add needless complexity and call it Version 2. This is much worse because you HAD a working system, now you don't. Imagine what King Arthur's wooden badger would've looked like. More comfortable seating, gilded ornamentation...basically a bunch of useless stuff.
So, how did I get the "Wooden Badger" nickname?
It all started with 2 sprints left in the development cycle. I was asked to do a "whiteboard" design review for an ordered queueing system. Generally a whiteboard review session occurs in the first few sprints because if the design is faulty it's best to know that before too much code is written. There is only one reason to hold a design review this late in the cycle...someone is married to their design approach and knows it is not right and that I will refute it which will cause lots of rework. My arguments, architecturally sound as they always are, can be rebutted with a simple sentence, "Dave, you know, you may be right, but we have to get this release out the door, so we can't go back to the drawing board now." And this argument only works if it is used at the last minute.
So I went to whiteboard session and learned about the requirements first:
- There are multiple queues, some ordered, some not, some ordered by different, varying keys.
- We must support tens of thousands of messages per minute.
This doesn't sound like anything I haven't seen before. Let's look at the design:
- The queues will be stored in the database as a single master queue table (not a Service Broker queue).
- There will be 3 outrigger tables to handle supporting details. No foreign keys needed. And since we support multiple, varying keys, let's store them as XML and key-value pairs.
- Ordering logic, when needed, will be performed during enqueueing. And that will require shredding the XML for existing data to determine the keys.
- Since ordering logic is so complex we need to single-thread ordered queue processing, so they used SQL Server's applock feature...which is a big mutex.
- The queues will not "drain", they will keep 14 days of successful messages in the queue tables, and 30 days for any failure messages.
- Support people will be able to query the queue tables real-time to look at throughput metrics or to look at individual message routing status.
If you didn't understand the queueing mumbo jumbo above, never fear. Rest assured that EVERY bullet point is a well-known, established anti-pattern. So I began my retort:
- Why are we not using JMS? It is an anti-pattern to build queues in a database.
- "It's too hard to support ordering across multiple JMSs without going single-threaded."
- But an applock is a way to make sure a process in SQL Server is single-threaded.
- "Well, we like this pattern better."
- Why not use Service Broker?
- "What is System Broker?"
- Oh boy.
- Why not have separate queues for ordered and unordered processing?
- "Well, it's too late to change the design."
- Why not drain the queues and on dequeueing simply write the dequeued data to a History table?
- "We can do that in Version 2."
Clearly the designers were happy and nothing was going to change. There was consensus that some people would look at my concerns and consider tweaking the design for Version 2.
Six months later our first customer experienced queueing problems. I again expressed my design concerns and the consensus was that we would begin working on them.
But of course break/fix work is not sexy like feature functionality, so the work was de-prioritized. Another 6 months elapsed until the next customer experienced problems. This time we hit the "knee of the curve" and the design simply could not scale any further. I again suggested we prioritize my simple design changes. I made the case that my few changes above yield radical improvements without a full-scale redesign. We could tweak the implementation and design to be more of a standard queueing system.
(Many developers relate to Star Trek too)
"It sounds like you want to build another one of your wooden badgers Dave."
Me: "I'm sorry, but I'm just trying to adapt a bad design a bit and make it follow best practices a little more. I think the final design won't be perfect, but we'll get the scalability without the cost of a full redesign."
"No, you signed off on the last design, now you just want to keep tweaking it hoping for different results every time. When will you learn that we can't keep designing Wooden Badgers, we have to do things using industry best practices. So, you're new nickname is Badger and we are going to redesign this the right way this time."
"Sounds good to me."
UPDATE: We've had a few additional large customer outages due to this design. Each time I've been asked to participate in giving band-aid fixes. And after every incident I always ask my collegues, "So, Badgers, when are you going to have that redesign done?"