slideshow 1 slideshow 2 slideshow 3 slideshow 4 slideshow 5

Welcome

You've found davewentzel.com.  I am an independent consultant and principle with David Wentzel LLC, focusing on anything related to data.  Here you'll find a collection of my thoughts, mostly on data architecture and SQL Server.  I host a blog as well as a Features section with searchable content.  I also showcase a Portfolio of open source projects I am working on, especially Drupal integration.  I host some family media to showcase some of those skills. Lastly, I have done quite a few presentations on SQL Server topics that you may be interested in.  

Data is one of your company's most valuable assets.  We can help you most the most of it.  

Convention over Configuration

"Convention over configuration" (CoC) is one of the newer catch phrases you'll hear as a architect. I really don't know where this started but when I learned Ruby on Rails this started to make a lot of sense to me. Also known as "coding by convention" this is a design paradigm that seeks to limit the number of decisions that developers need to make, keeping things simple. When something needs to be unconventional you simply "manage by exception" and specify that which is unconventional. In other words, lengthy web.config files are no longer needed unless something is unconventional. This leads to less code, more simplicity, less bugs, less documentation, etc.  In this post I'll cover some areas for CoC improvement in SQL including one of my ideas that I've never seen proposed anywhere else that would eliminate TONS of bugs in SQL, not to mention cut down on needles SQL verbosity.  

We need more of this in SQL Server, RDBMS, and data technologies in general.  That's the gist of this post.  

An Example of CoC
Your programming language du jour probably has classes and some of those will loosely map to your database tables. Instead of a class being named "classEmployee" and the table being "dbo.Employee" we have a convention that "the class name is the table name"...unless of course it is specified as unconventional.  So the class would be Employee as would the underlying table.  Now there is no need to map this relationship explicitly.  Best practices suddenly become implicitly enforceable.  

The Need for CoC in SQL Server...Cursors

TSQL and SQL can be very verbose languages.  For the most part, these languages are declarative and not procedural.  If you can eliminate declaring some things by using sensible defaults, then you are improving the language.  IMHO.  Let's look at a simple example of where CoC could help us be less verbose...cursors.  After that I'll show you my biggest pet peeve with the SQL language and how I propose solving it using a CoC approach.  

How do you write cursors? I bet it's a tedious and error-prone process.  If you say that you NEVER use cursors and NEVER found a good use for them then stop reading, you are apparently a genius and I am not worthy of your time.  For the rest of us that know that cursors are sometimes the ONLY way to do something in tsql, read on.  

Here is a screenshot of a very hacky cursor.  It merely prints out a list of database objects and their type.  This is probably the simplest cursor you could create in SQL Server.  You'll notice a bit of verbosity.  For instance, the FETCH NEXT statement in repeated in the code.  No big deal, but we 

shouldn't have repeated code.  I'll bet you that at some point in your career you wrote a cursor and, in haste, forgot that second FETCH NEXT statement and put your code into an infinite loop.  

Ugh.  

You'll also notice on Line 1 that I was too lazy to see what the actual datatypes were for the cols in question so I just set them to varchar(2000) and varchar(200), which should be big enough.  Right?  I'll bet you've done the same thing in haste at some point in your career too.  

Cursors are just too darn tedious.  They follow a pattern (or you can use one of the templates in SSMS) but even so, it is still tedious, and therefore error-prone.  

One way we could eliminate the datatyping tedium, as well as remove the entire declaration of "cursor fetch variables" is if Microsoft gave us a shortcut 

similar to Oracle's %ROWTYPE.  In Oracle if you add %ROWTYPE to a variable the variable will assume the datatype of the underlying referenced object.  It does this by declaring a "record variable" over the input.  A bit of TSQL pseudo-code might help to illustrate this.  

First, you'll notice we no longer have to declare the individual variables for every column referenced in the cursor.  Our example only has 2 columns, but I'm sure you've written cursors with 10 cols being referenced, first in the "declare" section, then at least twice in the FETCH statements.  That's a lot of repeated code to screw up.  

Instead we declare a record type against the cursor on Line 22 and tell the record type that it should inherit the underlying columns and datatypes.  We can now reference the variable using dot notation on Line 26.  Using something like %ROWTYPE saves a lot of typing and it makes things a lot easier to read.  But it's still too wordy.  

What would be REALLY nice is something like this.  

Look at how nice and compact this is.  Further, it follows the FOR EACH...LOOP looping construct that every major language has these days.  Here I don't even need %ROWTYPE because I get the record object right from the cursor (cur) object.  No FETCH NEXT statements, no CLOSE/DEALLOCATE statements, no individual variable declarations for each column.  

Yes I know that this syntax severely limits your ability to use advanced cursor techniques.  This syntax is entirely forward only fetching only the NEXT row.  I'm cool with that.  I'll bet 99% of cursors you've written have been forward only, fetch-one-at-a-time cursors anyway.  And when you need that other functionality you can always fall back to the old TSQL cursor syntax.  

That's in the spirit of Convention Over Configuration...or "managing by exception".  

 

My Contribution to CoC - the KEY JOIN (quasi-NATURAL JOIN)

Here is my (I think) original contribution to CoC in SQL Server.  I've never seen this proposed...if it has been then I'm sorry and did not mean to steal anyone's idea.  It's called the KEY JOIN.  

How many times have you seen a query with a structure like this

No big deal right? I want to see all employees with the data for the department that they are assigned to.  

To me that query seems very "wordy". Think about it, 99% of the time when you are JOINing employee to department it is going to be by DepartmentId. There is no reason why it would ever be anything else.

And, IMHO, in about 99% of EVERY JOIN I've EVER written I've always JOINd the tables by the same cols...and those cols are the PK on the parent side to the FK on the child side.

The 1% of the JOINs that aren't by key are really edge cases.  Sometimes these are analytic queries and sometimes these are queries where I'm looking to find "fuzzy" matches so I don't want to use the key.  

For the other 99%, I see an opportunity for CoC.  We can manage the 1% as an exception.  

NATURAL JOIN?

Well, it just so happens that the ANSI SQL standard has a NATURAL JOIN syntax that seeks to alleviate the ON clause, thus aiding readability.  Here is the above query using the NATURAL JOIN syntax:

So much easier to read without that ON clause.

But few vendors (Oracle is one) support NATURAL JOIN. In fact, NATURAL JOIN is highly discouraged from use. Why?

  • a NATURAL JOIN actually joins tables by like col names, not DRI. 
  • it's generally thought that if a key col name changes then the NATURAL JOIN would break everywhere so the ANSI standard protects us from our own devices. But seriously folks, if you rename a key col you better realize that you are going to have A LOT of code that changes.  This is a ridiculous argument

To be clear, in the employee/department example if I used NATURAL JOIN the join would be by DepartmentId, assuming it is common in both tables.  But if Name is common in both tables then the join would be but that col as well...and clearly dbo.employee.Name is something entirely different from dbo.department.Name.  Also, many tables you work with probably have auditing cols that are always named the same CreUserId and CreDTime for instance.  In the NATURAL JOIN world those cols would also be part of the equi-JOIN, which is clearly not right.  

So NATURAL JOIN, while far more succinct, is worthless in modern RDBMSs, which is why it is discouraged from use.  Darn.  This is also why most RDBMS vendors don't even bother to support it.  

So, I would love to see something in the ANSI specification (or MS could just implement it as their own extension in TSQL) called something like KEY JOIN.  

KEY JOIN's Conventions

  1. A KEY JOIN will always join two objects by their DRI.  This will be a PK or Unique Key on the parent side and a FOREIGN KEY on the child side.  
  2. If DRI is not present between two objects referenced in a KEY JOIN, then an error should be thrown.  
  3. A KEY JOIN will assume INNER (INNER KEY JOIN) unless otherwise specified.  Just like JOIN is short for INNER JOIN, KEY JOIN is short for INNER KEY JOIN.  
  4. LEFT (OUTER) KEY JOIN would indicate an optional JOIN from tableA to tableB, following the same semantics as a LEFT OUTER JOIN.  
  5. RIGHT KEY JOIN and FULL KEY JOIN would work the same as RIGHT JOIN and FULL JOIN, except the ON clause would be assumed.  
  6. If an ON clause is found during query parsing with a KEY JOIN then an error should be thrown.  

Can anyone see a downside to KEY JOIN?  I can't.  This would really solve readability and errors due to accidentally picking the wrong join condition.  Here is a query I wrote that attempts to show all column names for all tables.  But it's not working.  Why?

Yeah, the ON clause is wrong.  That was probably easy to spot check, but as your queries grow larger your odds of JOINing incorrectly increase and the ability to spot the bug becomes more difficult.  With a KEY JOIN it is almost impossible to screw the query up.  That's just much easier to read.  

I have no clue why this isn't part of the ANSI standard.  

 

Lastly...CREATE or REPLACE

Here's another pet peeve of mine with SQL Server...the verbosity of simple tasks like deploying a new version of a procedure in a database.  I'm sure EVERY stored proc you've EVER written followed some pattern like this:

Maybe you construct Line 1 a little different, but I'm sure the basic pattern is the same.  

Why oh why can't we just have this?

Oh well.  

How can you spot an opportunity for CoC?

I like to write code using known patterns that I've used for years.  Those patterns (like the cursor example) work 99% of the time and I don't need to ever think about the verbosity of the code I'm trying to write.  I just use the pattern.  In the past few years I've realized that patterns really aren't that good after all.  A pattern is just needless verbosity and configuration that will lead to subtle errors.  If we had a shorthand for the given pattern then we could make that the convention and merely manage by exception.  You should look for patterns in your own work that you can factor out and set up as an implicit convention.  This will save you a lot of bug grief and make your code easier to document.  

I've proposed and up-voted many of the above items on Connect (such as %ROWTYPE) and Microsoft just doesn't care.  As for KEY JOIN...well, one day I'm going to seriously propose that.  

 


You have just read Convention over Configuration on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Vertica on AWS

In the last post I covered the Vertica Installation Prereqs.  If you really don't want to install Vertica yourself you can always download a fully-functional VM from HP on the myVertica portal.  

Or you can run Vertica on Amazon in AWS.  I won't cover this in detail since AWS costs money and I'm frugal. Also Vertica is a *bit* different on AWS then when run natively. I have a slight bit of knowledge on running Vertica on AWS. I'll briefly document what I know in case you decide to run Vertica on AWS instead of in-house.  Running in the cloud is an excellent choice if you are an ISV and wish to package Vertica (probably the Community Edition) with your application.  

  • You'll need to be familiar with Amazon Management Console since most of the work is done from there.  
  • You specifiy the HP Vertica AMI you want to install using install_vertica.
  • You can still use Community Edition with 3 nodes and 1TB of data on AWS gratis.
  • You cannot create a cluster on AWS using MC (or some other things cluster-related). Instead use the Amazon EC2 console.  

You have just read "Vertica on AWS" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Installing Vertica 7

In this post we are going to install Vertica 7.0.x from the command line for the first node, then look at how to install Vertica by using the Management Console (MC) in a future post, which is much easier and can be done for all subsequent nodes. The method outlined below is also the EXACT same process you follow when performing an upgrade. The only difference is prudence dictates that you would backup your existing databases PRIOR to upgrading.

Installing from the commandline

There is no apt-get for Vertica unfortunately.

  1. From your laptop (remember, Ubuntu server has no GUI) log in to the myVertica portal (accts are free) and download the HP Vertica package for the correct OS and processor architecture that you need.  
  2. I assume you are connecting to your Ubuntu box using PuTTY with SSH. If so you can use pscp to copy your downloaded .deb file to your server. You can google to download pscp if you are running stand-alone PuTTY.
  3. Open a DOS prompt (elevated) and cd to the location of pscp
  4. pscp <location of .deb file> <user>@<ip>:<path on ubuntu>
  5. You should see a progress indicator.  After the copy is complete you are ready to install the .deb package
  6. sudo dpkg -i <pathname>  .  this only takes a few secs. 
  7. You now have the Vertica package installed. Now you have to install Vertica.  This is a bit confusing, just remember that installing Vertica is a two-step process.  
  8. You now have to run the install script. This can be run for a single node or it can be simulataneously installed to many cluster nodes using --hosts. In other words, you only need to install the Vertica package once, on your first node. We will install on a single node and look at the other options for our subsequent dev nodes.
  9. sudo /opt/vertica/sbin/install_vertica --hosts 192.168.0.x --deb <path to deb> -L CE --accept-eula .  The ip_address can be an actual hostname as well. Don't use localhost or 127.0.0.1 else you can't add additional nodes to your cluster. This would be considered a "standalone, single node Vertica installation".  And there is no known way to fix it later.  The -L CE instructs the installer that you are using the Community Edition
  10. It's highly likely that you may see errors. Simply research and fix them one-by-one, re-running install_vertica as many times as it takes to resolve all issues.  Remember that you are running on Ubuntu and not Windows so expect some customizations.  I have some examples of errors at the right.  Eventually you'll get so far and need to ignore certain system requirement checks in install_vertica. For instance, ext3 and ext4 filesystems or LVM presence. If you have ext2 filesystems or LVM present (see first error above) then install_vertica will constantly fail.  It's OK to run a test Vertica instance on these filesystem configurations.  When you are comfortable that any remaining FAIL messages can be ignored, just add --failure-threshold NONE to the install_vertica command.  Never do that on a prod system, but it works for testing and playing with Vertica.
  11. At this point install_vertica should complete after a few minutes.  
  12. Logout
  13. login to Linux using the dbadmin acct that install_vertica created.  
  14. You can now play around. Try the following (or wait until my next blog post where we will do some additional Vertica fun):   /opt/vertica/bin/admintools

You have just read "Installing Vertica 7" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

On Recruiters

Quick disclaimer:  I don't HATE recruiters.  I love them!  The good ones that is.  Without them I wouldn't have had ANY of the success I think I've had in my career.  But unfortunately most recruiters are sleazeballs.  That's why, over my almost 20 year career, I've only used 4 different recruiters.  And I've never used a job board.  A good recruiter is your partner.  This post is about every other recruiter.  

Have you ever received an email from a recruiter with the text Please send me an updated copy of your resume in Word format?  Assuming you thought you were a good fit, I'll bet you responded with an attached copy without giving it a second thought.  Well, this is a bad idea.  I'm going to outline a few horror stories for you.  Then I'll cover some things I've done over the last 20 years to avoid "recruiter fraud".  

Your resume might get published to online databases without your permission

In one case I had a recruiter, which shall remain nameless, take my resume and graciously post it on a public job board without my knowledge.  How do I know this?  Tip #1:  slightly alter every copy of your resume that you send to ANYONE.  Then when you see a copy of your resume somewhere you can trace that "version" back and determine who shared your resume without your consent.  I usually change the grammar on a single sentence and then log the change and who I sent that version to.  

Why go through this rigamarole?  Many recruiters will claim they have a position that is a perfect fit for you.  Meanwhile, there is no position.  The recruiter wants to add your name to their database.  So, what's the problem with that?  Here's a litany:

  1. These internal recruiting firm databases are EXTREMELY valuable intellectual property.  It is very common for a smaller recruiting firm to be bought out by a larger firm solely for the database.  Assuming the database accurately reflects your information then this might be OK.  But what if you live in Philly and have no desire to relocate to San Diego and the new owner of the database has offices in San Diego and you get calls DAILY from San Diego wondering if you are interested in a .net position there?  This happened to me...twice.  The latest time was when a large recruiting firm was bought out by one of the MONSTROUSLY large, ubiquitous job boards.  I had read about the buyout in WSJ and within a few weeks I was getting more spam than ever from India-based recruiters.  After some legwork I found a copy of my resume on this MONSTER of a job board, and after a few hours of fighting with their customer service people, got it removed.  BTW, the copy of my resume they had was over 10 years old.  
  2. When times get tough recruiters rent their database assets to non-recruiters.  They sell your information so you can later be spammed and scammed.  How do I know this?  I have methods that I use to track this that I don't want to divulge.  Hint:  it involves using different email address, just like subtle differences in resume text, to track who is using my data without my consent.  
  3. Recruiters will post your resume on some job boards, again, without your consent.  I call this the "buy side vs sell side" recruiting approach.  Most recruiters are "sell-side", meaning they focus on selling a service to their client.  The fact is, this generates the most money in the industry.  The theory is that there are far fewer COMPANIES looking for candidates than candidates looking for companies.  So the recruiter focuses on selling to the client first, then finding a candidate later.  But there are a few buy-side recruiters.  These guys find a superstar candidate...one that is a paper tiger, well-spoken, and knows his skills...and then market that candidate to ANYONE who will listen.  This isn't necessarily a bad thing...I know because it has happened to me.  A recruiter realizes that someone is a HOT commodity and he markets the candidate to all of his clients even if they may not have a current need.  It works!  It only becomes a problem when the buy-side recruiter takes this too far and starts posting your resume with her contact information instead on various job boards.  This will eventually come back to hurt you professionally.   

Your resume is easier for a recruiter to alter in Word format

Not all recruiters do this, but most will ALTER your resume before sending it to their client.  THIS PISSES ME OFF.  I put a lot of time and thought into my resume and while others might think my resume is GARBAGE it is MY GARBAGE.  And I don't care to have it altered without my consent.  I have no problem with ANYONE who wishes to provide me with criticism to make my resume better.  There are recruiters who have more experience than I do and I value their inputs.  But ultimately these decisions are mine to make.  

Some alterations are OK.  In most cases the recruiter will insert their firm's logo at the top of your resume and remove your personal contact information.  This is smart since some clients may decide to poach a candidate to avoid paying the recruiter's fees.  If a prospective employer EVER tried to do this with me I would not do business with them.  That shows a total lack of ethics.  Although I find most recruiters to be sleaze-balls, if a recruiter introduces me to a client then that recruiter has earned his fees.  Period.  Most of us believe recruiter fees are outrageously high and we all complain about it, but until our industry cuts out the middleman, then the middleman deserves his pay.  So, I certainly would not mind if a recruiter alters my resume to remove my contact information, but I should be notified FIRST and should need to APPROVE this and this is the ONLY thing that should change.  

I once had an ambitious recruiter alter my contact information at the top of my resume.  At the face-to-face interview I saw this and asked if I could peruse their copy more closely since this was not my version.  I was surprised to see that my blog information, LinkedIn profile address, and even my NAME were altered.  The recruiter altered my NAME!  That is unacceptable.  Smart clients wish to do their own candidate vetting independently of the recruiting firm and the best way to do this is via reading someone's blog, LinkedIn profile, and googling their legal name.  They can't do this if the recruiter altered this data.  

Blatant falsification of my resume

On another face-to-face interview the interviewer wanted to know about my experience with software XYZ.  I mentioned that I had no experience with XYZ and didn't even know what it was.  The interviewer showed me that it was displayed prominently on my resume in the "Skills" section.  

Huh?

The recruiter actually falsified my resume!  I explained that I had no knowledge that XYZ was on my resume and apologized profusely.  The interviewer explained that XYZ was some industry-specific, esoteric piece of software that maybe a handful of people in the world worked with, but they were hoping to find someone with experience in it.  It was not a requirement, rather a "nice to have" for the position.  The interview proceeded and I think I did ok with everything else, and everyone was cordial and happy, but I never got a call back.  

And I didn't expect one.  

If you were that interviewer, would you recommend your company hire a candidate from a recruiting firm that falsifies documentation?  And then pay them a finder's fee for the privilege?  I wouldn't.  If it wasn't for the recruiter being too ambitious I am positive I would've gotten that job.  The recruiter lied his way right out of a commission.  

You're probably thinking that this kind of behavior is an edge case.  Certainly this is not prevalent?  WRONG.  For a number of years I was a team lead at a large ISV and interviewed, on average, about 1 candidate a week for various open positions...DBAs, Java guys, QA folks, etc.  I did a non-scientific study during this time. I asked every candidate to provide me with her copy of her resume.  Post-interview I compared the candidate's version with the recruiter-supplied version.  In more than half of the cases I saw "content" modifications to the resume.  By "content" I mean the change was not:

  • formatting (font change, bold/italic changes, styles)
  • spelling/grammar
  • reordering of sections (skills moved to the top, education to the bottom)
  • removal of candidate contact information
  • insertion of recruiter's logo/contact info

Those changes are OK.  I guess.  I don't approve of someone modifying my stuff, but you could make the case that the recruiter was just being overly "helpful".  Here are some examples of what was changed.  And there is no good reason for it, other than blatant FRAUD:

  • Employment dates were altered to hide periods of unemployment.  
  • Company names were altered.  Usually this was something like "American Bancorp May 2013-May 2014" to something like "Large Banking Institution May 2013 - May 2014".  Why would this need to be altered?  I couldn't think of a good reason.  I did some research and in some cases American Bancorp was also one of the recruiter's clients, and the recruiter was trying to avoid being accused of poaching.  Utterly sleazy.  The recruiter was trying to pass off someone he placed elsewhere.  
  • Employment responsibilities were rewritten to sound like the candidate was more "senior"
  • And the most prevalent edit...SKILLS.  My company used Rational tools and I saw lots of resumes that were altered to include the word "Rational" somewhere.  This was so unbelievably prevalent with one recruiter that I called him and told we really need people with NFC experience.  He asked what NFC was and I replied that it is a standard ITIL tool used in TOGAF analysis.  (BTW, NFC is really "No Fuck1ng Clue"...I made it up).  Can you believe that I started seeing resumes with NFC experience on them?  I now had ammunition to remove that recruiter from my company's "preferred vendor list".  

Need more proof?

If you still think these stories of recruiter fraud are merely edge cases and not indicative of most recruiters, well, perhaps we should ask a recruiter?  Read this article, written by a recruiter on a recruiting forum website.  The author, and frankly most of the commenters (also recruiters), see nothing wrong with recruiters who alter resumes to make them look better in front of a client.  And they do it without seeking the permission of the candidate. Here are some screenshots that should get your blood boiling:

OMG!  This is fraud folks.  It's like a used car salesman saying, "Yeah, GM didn't put air conditioning in this model but it blows out cold air anyway."  Of course it does...when you test drive it in December.  

Here's another:

Thanks for doing me that favor pal!  See my story above where the recruiter did me a FAVOR and cost me the job.  I can't believe the arrogance.  

Let's move to the comments:

Gotta love the author/editor analogy.  Here are the problems with that fallacious analogy.  The (non-sleazeball) editor always runs his edits past the author before publication.  More importantly, an editor may change grammar, spelling, or style, but doesn't change FACTS.  The editor is not the fact-checker.  Note above that Craig also knows his target market better than the candidate.  Yeah, the day a recruiter knows more about my domain than I do is the day I retire.  What an asshat!

Here's a more accurate analogy.  Let's say the local grocery store decides that the makers of Tide have horrendous marketing and the local grocery store could do better.  Does the local grocery store change Tide's marketing material?  Of course not.  In fact, that is trademark infringement and is illegal.  Resume altering is the same thing.  My resume is my "mark" (it isn't a trademark) and I would prefer my mark not be altered.  

Another comment:

"Sure Rebeccah, go ahead and modify my resume.  I promise not to be shocked at the interview when you total misrepresent my skills and experience."  Clearly Rebeccah is contradicting herself in the second sentence.  If Rebeccah asked me I would give her my consent but I would ASK to see the changes BEFORE they were submitted.  This avoids the shock.  I'm appalled that Rebeccah would want any of her candidates to be shocked at an interview.  Isn't that lack of preparation?  

And look at that last sentence...she actually modifies executive resumes.  Words fail me!

Rebeccah is clearly inexperienced.  I agree that Times New Roman is a yukky font, but there are very good reasons why it is used.  It is guaranteed to be an installed font with every copy of Word since...forever.  Clearly Rebeccah does not know how Word works when the font du jour she uses is not installed on her client's laptop.  Then there is the issue of the hiring manager that uses OCR and it can't interpret her nifty font.  The point is, TNR is horrible, but the alternatives can be worse.  Be careful!

If a candidate has a resume professionally prepared then that means they are better prepared.  I like that candidate already.  It shows initiative.  It shows attention to detail.  It shows that they understand their limitations.  I had mine professional prepared...15 years ago.  Since then I had other "professionals" try to modernize it but each one looked horrible and I like the style and am sticking with it.  As an interviewer I want to know how prepared the candidate is.  Craig is conflating "professional resume preparer" with "recruiter".  The two are totally different.  And I don't know any professional resume preparer that will alter CONTENT.  

And again, the analogy used by Craig is wrong.  If the candidate submits a professionally-prepared resume then the CANDIDATE gave his explicit consent.  If the recruiter prepares the resume and submits it then the CANDIDATE did NOT give his consent.  It all boils down to consent, ethics, and honesty.  

"But most technical people can't write a coherent English sentence in a resume and I'm helping the candidate by doing this"

I was told this once by a recruiter who reformatted my resume to make it look more modern without my consent.  To any recruiter reading this:  YOU ARE LYING WHEN YOU DO THIS.  If your candidate can't put together a good resume how do you think they are going to represent YOU in front of YOUR important client?  Get permission from the candidate before altering a resume.  And consider coaching the candidate and having them make the recommended edits. This helps the candidate LEARN how to present themselves to a client better.  When you add lipstick to a pig, you still have a pig. 

Some approaches to combat "recruiter resume fraud"

Here are some approaches I've tried to combat overzealous recruiters:

  1. Only work with reputable recruiters.  I've been in IT since 1997 and have always used recruiters.  Exactly 4 of them.  I've worked or consulted for at least 10 different companies and have used 4 recruiters.  These recruiters I trust implicitly.  I know they modify my resume and I approve of what they do.  I TRUST THEM BECAUSE THEY'VE EARNED IT.  There are lots of recruiting firms and many of them are disreputable.  I understand and accept that by not working with some recruiters that I am likely missing out on GREAT opportunities.  I'm ok with that.  If a company wants to work with a shady recruiting firm then I don't wish to work at that company.  The recruiters I work with know me, my personal foibles, my experience, and what I want out of a company.  They have a long term PERSONAL relationship with me.  And I know their relationship with their client is similarly strong.  When you lay down with dogs, don't be surprised when you get fleas.  ONCE I accepted an opportunity with a client of a disreputable firm because it was an amazing opportunity.  It really was an amazing opportunity.  And the recruiting firm was ALWAYS late paying me.  Once they were 5 months in arrears in fees.  I walked out on them, and the client.  It was a great opportunity, but a bad circumstance.
  2. Try sending your resume to a recruiter in pdf format and tell them you do not use Word.  Sometimes this is enough to thwart recruiters from resume fraud.  
  3. Create an online version of your resume and reference "supporting details" on your Word version that point to that online version.  If you do this in enough places the recruiter can't get too sleazy.  
  4. Whenever sending your resume to a recruiter, use the trick I outlined at the beginning of this post where you track subtle changes to your resume that can be traced back to a recruiter.  At least you'll know who is sleazy later.  
  5. Whenever you send your resume ensure you include a DISCLAIMER in the email text.  Here is the version I use.  This won't stop sleazeball recruiters from doing whatever they want, but it doesn't hurt to put them on notice.  If a recruiter balks at this just walk away.  Don't lower your standards because the recruiter claims to have some amazing opportunity.  They never do.  I've been burned so much by this that I've finally learned this truth.  
  6. Make your Word document as protected as possible without being a nuisance.  None of these settings are foolproof, but everything helps.  Here's what I do when I'm done editing the "version" for a given recruiter:
    1. File|Permissions|Restrict Editing.  
    2. Click Edit restrictions and select "No change (read only)"  
    3. Click "Yes, Start Enforcing Protection"
    4. Select Password and choose a simple password.  Remember that the recruiter is going to want to add their logo.  You aren't trying to thwart the NSA with this password.  
    5. File|Protection|Mark As FInal
    6. Now when a recruiter opens the document it will be read only.  This won't stop a recruiter from copy/pasting your resume text into another file, but sleazy recruiters will always be sleazy.  

Regarding References...

When a recruiter asks you for references to call BEFORE any contact with a prospective client, tell them NO.  There is at least one HUGE national recruiting firm that does this under the premise of being a value-add for their clients.  On the surface this sounds reasonable.  They want to check your references before embarrassing themselves with an important client if you are a dud.  

Problems with that:

  1. People can (and do) fake their references.  Calling a reference is not a replacement for a background check, it is merely one part of good due diligence.  I've heard of cases where candidates used their spouse as their "manager reference."  So, if references are so darn important, then make the recruiter do a background check on you too prior to divulging your references.  They won't.  Why?  Background checks cost $$$.  
  2. Recruiters will try to recruit your references.  A recruiter from a HUGE national recruiting firm demanded to call my references.  I had a reference who is a BigShot in my industry.  Seriously, he's well known.  The recruiter called the reference because he had an opportunity for me.  The recruiter spent the entire time trying to recruit my reference.  Then he asked my reference if he knew anyone who was interested in being recruited too.  My reference called me and was furious at me for doing business with a sleazeball like that.  How many times can I go back to the well with this reference?  Don't abuse your references by giving them out to recruiters too early in the process.  
  3. Recruiters from the same recruiting firm will ALL want to talk to your references.  I get calls from this HUGE national recruiting firm at least 4 times a year and it is always a different recruiter.  And they always want my references BEFORE they even tell me about their opportunities.  I mention that Recruiter X already called my references a few months ago and that she should go talk to him.  It doesn't work, they still need to talk to my references.  There is no good reason for this other than that this recruiting firm is sleazy.
  4. I've had a recruiter tell me that I had to provide the references first, but that he wouldn't call them until I was prepared to move forward with a face-to-face interview.  In a moment of weakness I handed over my references.  Guess what?  He called them within the hour trying to recruit them.  How do I know this?  Because one of my references was my own HOME PHONE NUMBER with a made-up name.  The sleazeball recruiter didn't even bother to check the number.  (I love doing little social experiments with recruiters...have you noticed that yet?).  
  5. References will always say good things about the candidate...that's why they are the reference.  I'm not going to allow someone to be my reference unless I am POSITIVE that I will get a glowing recommendation.  Therefore, references, by their very nature, are worthless.  
  6. Your references may have loose lips.  Whenever you use someone as a reference, decorum states you should tell her first.  By asking the person to be a reference you are implicitly telling that person that you are looking for new opportunities.  In some circumstances this could be a problem if your reference goes and blabs.   

Don't give your references to any recruiter until you've gotten through at least the phone interview with the prospective client.  There is no good reason to hand out your references prior to that.  You need to be sure that you are a good fit for the opportunity.  And when the recruiter says that this is their NON-NEGOTIABLE policy, then tell them, "Thank you, but I'll look elsewhere."  

As I've said, I've interviewed dozens of candidates and I would never dream of asking for a reference first.  Most candidates are not good fits and I don't have the time or the inclination to do reference checking first.  I get a feel for the other person at the table and I go with my gut.  Then I get a background check.  Those are much harder to fake.  I couldn't care less about a reference.  

Concluding Thoughts

Whenever any industry is experiencing a boom you can rest assured that the percentage of charlatans will rise proportionally.  We see this with IT people that pad their resume to include the hot new skill that is in demand.  We also see recruiters who are just interested in getting asses in chairs and are not interested in ethics.  

My advice to IT people reading this blog post: Be leery of all recruiters until you've built up trust.  Never allow a recruiter to alter your resume's content and never provide references until you've had the opportunity to talk with the client.  

To any recruiters that might be reading this post I offer you my advice:  Never alter someone's resume without permission.  Never recruit someone's references.  Always be honest in your dealings with your clients and your candidates.  If you work at a sleazy recruiting firm get out now.  Life is too short to be a whore.  I know lots of good recruiters who have left these places and are much happier now.  If you can't abide by my rules, don't contact me.  I'm not interested in your business.  Remember: you called me, I didn't call you!  

I wish you the best of luck in life's endeavors.


You have just read "On Recruiters" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Job req o' the day

Saw this in an email from a recruiter

JASON is a .net web service now?  Retirement must be tough.

And LINK?  Um, can't think up a snarky comment for that one.  

Back in the dot com days I got probably 5 calls a day from recruiters wanting someone that knew "ESS-QUE-ELL" Server.  "Nope, sorry, don't know it, bye."  I won't talk with a recruiter that doesn't know the absolute basics of our industry, like how to pronounce the technology.  If the recruiter doesn't know the basics of the technology then how can she fulfill her client's requisition?  These recruiters are relying on the client to have a perfect req prepared and the recruiter functions solely as the middle man, trying to mine for gold.  But you'll never find gold if you don't know what gold looks like.  These recruiters cold call possible candidates by scouring job boards and LinkedIn.  Sorry, but I've wasted far too many hours taking phone interviews for ESS-QUE-ELL Server jr dba positions that I would've had no interest in if the recruiter would've understood the client's ask.  

(I don't know this Jason either)

Now, I've heard people use the auto mechanic analogy to dispute my contention that recruiters should understand what they are recruiting.  "You don't need to know anything about an internal combustion engine to go to a mechanic when your car won't run."  

Wrong analogy.  That analogy would apply if a business person asked me how to write a query.  I expect a business person not to know technical matters.  That's why they hired me.  And I hire a mechanic because I don't know engines.  

A better analogy for a recruiter is a used car dealer.  "I won't buy a used car from a dealer that can't tell me if the car has a V-8 or a V-6."  Exactly.  The used car dealer functions in an "agency" capacity (he finds you a car that meets your requirements) , just like the recruiter finds a candidate that meets the client's requirements.  

Yes, that's right, I just compared technical recruiters to used car salesmen.  But it's true.  I am buying a product from a recruiter, I expect him to have a cursory knowledge of that which he is selling.  

Not all technical recruiters are bad, just the ones that breath.

(kidding of course)

(Jason Sudeikis?...wait...no, I don't know this Jason either.)


 

You have just read "Job req o' the day" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Vertica Installation Prereqs

The best way to learn a new technology for an IT person is to install and play with it.  Today we'll get our OS prepared for an install of our first Vertica instance. In a later post we'll install a second and create a K-safe cluster (more on that in a future post). We'll use the free Community Edition.  

Vertica on Linux
This is what I'm going to use for the remainder of these tutorial blog posts on Vertica.  Specifically, Ubuntu.  

Prereqs

  • Ubuntu or another debian-based Linux. You can install on other platforms but my tutorial is geared toward Ubuntu.
  • If using Ubuntu, ensure it is 12.04 LTS. Don't load 14 (or whatever is the "current" LTS when you read this...or determine if Vertica supports newer Ubuntu releases).
  • SSH installed (Vertica will configure passwordless logins for you).  SSH is how cluster nodes communicate with each other
  • Ensure sudo is installed: which sudo

Storage Prereqs

  • Your data directory must be separate from your catalog directory
  • Both directories must have identical paths across ALL nodes.  Plan your directory structure.  
  • I think it is best to therefore use /home/catalog and /home/data. The installer will create these directories with the requisite permissions, but the parent folders (in this case /home) must be created first (which of course /home is).
  • As with any RDBMS, plan for data growth. In SQL Server there are algorithms on google that will tell you how much EXTRA disk space you'll need for things like tempdb and logs. Vertica is no different and HP recommends that disk utilization per node be no more than 60% for K-safety=1. This allows for background processes to have lots of needed space.  K-safety will be explained in a future blog post.  

 Disk Readahead

  • must be at least 2048
  • use --getra to view the current readahead
  • sudo /sbin/blockdev --setra 2048 /dev/sda1 (change the device accordingly)
  • sudo /sbin/blockdev --setra 2048 /dev/mapper/VERTICA1--vg-root
  • ensure that succeeds. Now
  • sudo nano /etc/rc.local and add the EXACT same lines above the exit0 line

Install NTP (network time protocol)

  • node clocks must be synchronized for conflict resolution.
  • sudo apt-get install ntp
  • sudo /etc/init.d/ntp reloadreboot
  • cd /usr/sbin
  • sudo ntpq -c rv | grep stratum
  • A stratum of 16 means that you have a problem. You might want to wait an hour or so before freaking out at a 16. Sometimes it takes awhile to sync. Not sure why.  

Swap File

  • change your paths accordingly
  • sudo dd if=/dev/zero of=/media/swap.img bs=1034 count=3M
  • sudo mkswap /media/swap.img
  • sudo swapon /media/swap.img
  • sudo nano /etc/fstab  add the following
  • /media/swap.img swap swap sw 0 0 

Other OS Tasks

  • sudo apt-get install pstack, mcelog, sysstat  various support and debugging tools

Reboot

At this point I would do a sudo shutdown -r now.  I know that technically this isn't required but it seems to be the only way I could make some of these settings "stick".  Probably just me doing something incorrectly.  

Other Installer Notes

  • the installer will create a dbadmin Linux user and verticadb group.  These names can be changed but then YOU must ensure you set things like the home directories correctly.  Therefore /home/dbadmin will be created with necessary chown/chmods for you.  
  • dbadmin will own the db catalog and data files on disk.  
  • dbadmin is configured for passwordless SSH communication between nodes.  
  • The binaries that Vertica installs will be located at /opt/vertica.  
  • The installer must be run with su or sudo.  

Summary

We'll cover the actual installation of your first Vertica node in the next blog post.  Installing to the first node requires some extra steps that are not needed for subsequent nodes.  However, the steps outlined in this blog post must be done by you PRIOR to installing Vertica on ANY node.  


You have just read "Vertica Installation Prereqs" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Why isn't my Java catching SQL errors...or...How I learned to stop worrying and love the ring_buffer target

Today's Story

"Hey Dave, I found a bug in SQL Server."

I hadn't even sat my bad down yet.  This was going to be one of those days.  "I'll bet you did find a bug.  SQL Server is horrendously buggy.  We should migrate our app to Cassandra immediately."  

Java Joe could detect the derision and changed his tack.  "Well, can you check out this code for me.  Clearly it throws as error in SSMS but not when I call it from Java.  

(At this point I'll save you from viewing the Java.  Essentially this was the code and, yep, the Java was not, indeed, throwing the very obvious error.)

Time to begin basic Java debugging.  "Have you tried this code from SSMS and if so, did it throw the error correctly?"  I already knew the answer, but I wanted to lead Java Joe down the troubleshooting path.  

"Yes I did and it errors correctly in SSMS, just not in Java which is why I thought maybe it was a bug in a SQL Server driver somewhere.  jdb does not show the error in the stack either.  I'm at a loss so I was hoping you could look."  

"Just because you don't see the error using jdb doesn't mean the error isn't coming back from SQL Server in the output.  The simple fact is, we use so many "middleware" components in our applications that somewhere, somehow, one of those is hoovering the error without re-throwing it to the next tier.  We use jdbc drivers, both 2.0 and 3.0 at places, websphere, jdo/kodo in places, hibernate, ruby, groovy, highly-customized versions of Spring, as well as our own in-house developed DAO (data access objects) factory.  There's probably even more layers between the user and the database error that I'm not aware, forgot, or am repressing.  Somewhere in one of those layers you have the equivalent of an old VB "On Error Resume Next" (arguably, a FINALLY block) and you don't know it. "  

This is common sense to me but I knew I would have to prove this fact to Java Joe.  "You see, this is a basic syllogism of error handling.  SQL throws an error, Java does not throw it.  Therefore, the problem is somewhere higher on the stack."  

I didn't quite have Java Joe convinced.  "But I tried everything and I beg to differ but there is nothing wrong with any of our tiers.  Is there any way you can do some magic on SQL Server to ensure that the database engine is really throwing the error?  Maybe SQL Server DDL errors are not passed to any non-SSMS utilities, or something."

Ah, programmers are often scared of their debuggers.  At this point I would generally get angry, but I already had a simple utility in my toolbelt ...an extended Events script.  

And here is the important stuff:

click to download

Note that we are creating a ring_buffer TARGET to capture any error_reported event with the given severity.  You can download the script here.  

Extended Events Targets

I see a lot of people that hate the ring_buffer target.  I really couldn't agree more that there are issues with it, specifically:

  • You can lose data.  The ring_buffer is a "FIFO ring buffer" of a fixed memory size.  So, in some situations you may miss some captured data because it was forced out.   But this shouldn't be a surprise.  It is called a "ring buffer" for a reason.  Caveat emptor!
  • No GUI.  You have to parse the XML which can be tedious.  But that's why we have pre-package scripts to help us.  

So, yep, those are some good reasons why you shouldn't RELY on the ring_buffer for important work.  But when I hear that people HATE something the contrarian in me wants to know when I should LOVE it.  

Reasons to Love the ring_buffer Target

I've found two really good uses for the ring_buffer that may help you out.  

  1. When I need to capture some data for trending purposes and I don't mind losing it under adverse scenarios.  
  2. Finding errors being thrown real time, such as my allegory above.  

ring_buffer Usage for Trending

Example:  saving deadlocks for later analysis.  The ring_buffer has a set max_memory and when that is reached the oldest events are purged, FIFO-style.  Some deadlock graphs are huge and can fill the ring_buffer quickly.  

Why not just use event_file then and not have to deal with ring_buffer issues?  

  1. Not everyone has the necessary filesystem access to their SQL Servers to write and delete files.  
  2. If you are tracing a lot of events you risk writing huge event_files.  If your Ops Guys monitor your servers it won't be long before they want to know what is going on.  
  3. Big event_files are slow to query.  

PerformanceCollector uses ring_buffer targetsIf you don't have these issues then PLEASE consider using event_file instead.  Murphy's Law says that even though you don't NEED your trend data now, when you experience severe issues someday you'll curse choosing the ring_buffer when it loses a good chunk of your forensic data.  

My PerformanceCollector utility uses ring_buffer to look for any deadlocks.  On a schedule it will write the contents of the ring_buffer to a permanent table and restart the ring_buffer.  If a few deadlocks roll off the end of the ring_buffer I really don't care.  I'm monitoring for deadlocks to look for trends...basically, bad data access patterns that we've introduced into our application that we need to fix before they get out of control.  

Watching Errors Real Time

So back to our story...Java Joe needed hard evidence that an error was indeed being thrown, as well as what the actual error was.  I had already prepared a stored procedure for just this purpose (Java Joe wasn't the first developer to have this problem...Ruby Ray had a similar problem weeks before).  The stored proc takes a parameter of START, STOP, or ANALYZE.  You definitely don't want this running 24x7 since EVERY error thrown will be logged.  You need to run ANALYZE BEFORE you run STOP.  The process of stopping a ring-buffered session wipes out the buffer which means your data is gone.  

Let's look at our example

Here is our example code again, except this time I annotated it to show the basic workflow.  

  1. Start the event session.  Choose an error severity that you need.  The default is 16.  
  2. run whatever code (or your application) that should be generating errors that you wish to see
  3. run ANALYZE which will generate the output below.
  4. run STOP to stop the event session.  This will save some system resources and about 4MB of memory.  

So, what is the output?

The proc outputs two result sets.  One lists all of the "error_reported" events that were thrown.  The second decodes the XML a bit to make it easier to look at exactly what the errors were.  

Let's take a look at the data.  

Note that we actually threw two errors.  

Huh?

This is actually expected based on how I constructed the example.  The first error is 1505 (row 8 above).  The second, not in the screenshot above, is error 1750 (Could not create constraint.  See previous errors.). 

And that is exactly what I would expect.

Summary

Occasionally you may find yourself in need of capturing errors that are occurring on a running system real-time. The way to do this is to use Extended Events.  But the rigamarole required to set up an event session that logs to a file, and then have to query that file, can be cumbersome.  Sometimes you just need something quick and dirty.  For those times a ring_buffer target works wonderfully.  The example above I've used NUMEROUS times to assist developers who are scared of their debuggers to see actual SQL Server errors that are somehow being goobled up in one of their data access technologies.  


You have just read "Why isn't my Java catching SQL errors...or...How I learned to stop worrying and love the ring_buffer target" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

DBCC AUTOPILOT and hypothetical indexes

I gotta admit. I never heard of DBCC AUTOPILOT or how to manually use hypothetical indexes.

SQL Server Performance Tuning with Hypothetical Indexes

Tags: 

HP Vertica

I've been using HP Vertica since before HP bought the company.  I first wrote about Vertica in 2010 when I wrote about column-oriented databases.  I'm again using Vertica in my daily responsibilities and I decided to try to get my Vertica certification.  I'm not a Vertica expert but I find taking certification exams helps you also understand your shortcomings of a product.  For instance, I passed my SNIA (SAN) certification on the first try with a perfect score.  However, this was just dumb luck.  Every question on NAS and tape storage I guessed at, and the right answers were fairly obvious.  Since I'm not a full-time SAN guy, I'm a data professional, I don't have much need for tape and NAS so I really didn't care to learn more about those topics.  But it was interesting learning what SNIA feels is important in the storage world.  

In the process of getting my Vertica certification I thought it would be wise to put together a blog series on Vertica for anyone else that wants to learn this technology rapidly in a hands-on fashion.  In these blog posts I'll cover what Vertica is, how to install it, we'll migrate AdventureWorks to it, and we'll do lots of fun labs along the way.  The posts will be geared toward those data folks who are familiar with MS SQL Server.  I specifically approach learning Vertica by calling out its differences with SQL Server. 

You'll find this is a lot of fun.  Vertica is pretty cool.  

Briefly, what is HP Vertica?

It is a column-oriented, compressed, relational DBMS.  Many people consider this a NoSQL solution, but it does use a dialect of SQL for its manipulations.  It is clustered across grids and nodes like many distributed NoSQL solutions, with builtin data redundancy (called k-safety), which means it has the typical marketing gimmick of "requiring no DBAs".  I can assure you that it performs QUITE NICELY for read-intensive workloads.  There is also an entire analytics platform that comes with Vertica...roughly equivalent to SSAS.  

What makes Vertica unique is that it persists the data in groupings based on the column of the tuple (table) instead of row-oriented, traditional, RDBMS offerings.  If you think of a table as a spreadsheet then retrieving a single row is an ISAM-type of operation.  This works very well for OLTP applications.  But in some reporting applications it is possible that you care more about an entire column than about the individual rows.  If your query is something like "give me the total sales for this year" then querying a columnstore will result in far less IO and will run radically faster.  

Even the amount of utilized disk to store the data will be less.  Columstores compress much better because like data types are grouped together.  Basically, there are more rows than columns and each row will need "offset" information to store the different datatypes together.  You'll need fewer offset markers if you organize your storage by column.  TANSTAAFL (there ain't no such thing as a free lunch), as economists say...the cost is that updates to existing rows require one write per column in the table.  So a columnstore is probably not best for your OLTP tables.  

That's a quick overview and is not all-inclusive.  I'll cover lots more in the next few posts.  

What's it cost?

The Community Edition is free for up to 1TB of data.  And if you consider that compressed columnstores generally take up a fraction of the space of a traditional row-oriented DBMS...that's actually a lot of data.  

HP Vertica Documentation

Vertica documentation in HTML and PDF format

Certification Process

Certification Exam Prep Guide

Getting Started

You need to register with HP to be able to download the Community Edition.  You can do this at my.vertica.com.  After you register you'll immediately be able to download the CE (or the drivers, Management Console, VMs, AWS documentation, etc).  

In the next post I'll cover how to install Vertica on Ubuntu.  You can of course download a VM instead but I believe the best way to learn a product is to start by actually installing the bits.  


You have just read HP Vertica on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

git and Powershell Integration

The power of git is its CLI and scripting abilities.  That of course means it is less-than-friendly for a Windows GUI guy.  

Or is it?  

I was recently trying to automate a simple git script in Windows to automatically do a fetch and merge (I know this is a bad idea, but I had my reasons) for all git repositories on my local machine.  Basically I have a folder tree where I keep my source code with subfolders for each remote repository I connect to...something like the structure on the right.  Most of these repos I use solely for code review purposes and rarely change any of the code, hence why I'm cool with fetch and merge (or pull if you will).  Even in the rare cases where I do want to make changes it is almost always master branch.    

Anytime I want to automate something in Windows I use Powershell.  To manipulate git on Windows requires loading Git Bash from within Powershell using some variant of loading a Bash shell in a DOS shell in a Powershell shell.  Kludgey.  it also makes error-handling unreliable. 

Some simple googling turned up the posh-git Powershell module.  

posh-git

F'ing awesome module.  

If you ever longed for the day where git used more normal DOS style-syntax, this is the day.  The author was very ingenious.  Instead of creating PoSH commands that wrap git commands (for instance, invoke-gitfetch, or something stupid like that), which would require us to learn another set of git CLI commands, posh-git actually uses the standard git commands.  With tab-completion of commands too.  

When you "cd" into a git-enabled folder your Posh prompt even tells you what branch you are currently on without any extra intervention.  Note the screenshot on the right.  Ignore the error, that's totally unrelated to posh-git...it's something I'm experimenting with.  

Note the extra "stuff" in brackets.  master is the branch.  It's in cyan meaning it's in sync with its remote.  The other options in burgundy indicate the number of files that you have added/modified/removed/conflicted.  It basically did a git status without you having to.  

In any case, I could rave about posh-git for a long time.  Just note that it helps you automate git on Windows.  

Installing posh-git (or any module) Automatically -- Best Practice Pattern

When writing an automation routine it is best to never assume that the host running the script has all of the requisite software installed and environment configured correctly.  You never know when someone will run the script on an environment that isn't up-to-snuff.  In the Linux world (Ubuntu and equivalents, really) you can code your scripts to perform an apt-get which will get the required software if it isn't present.  Under the covers these things are using curl and wget to download and install what is needed.  

We have no concept of this natively in Windows but that doesn't make it right.  gitFetchAll.ps1 is a Posh script that I will use for the remainder of this blog post.  I wrote a function called loadPoshGit that will load the module regardless of whether it is even currently installed.  The code to do that is very simple.  I really like using this pattern in all of my code where I have a dependency on some small utility that may not be available.  This helps out an Ops Guy that may not know or remember where that certain utility can be found.  This pattern will work for any piece of code that is downloadable from your machine.  

Automating git on Windows

Now that we can guarantee we have posh-git installed and loaded into our profile we can run some git commands.  gitFetchAll.ps1 is the script I wrote that takes a folder structure of git repositories and remotely fetches and merges any new changes.  Here isthe guts of the script:  

$RootFolder is declared by you.  We find every $subfolder in that $RootFolder and traverse it, calling git pull (fetch and merge) in every folder.  

That is ridiculously easy.

Let's see what this looks like:  

  • I first perform a cd to get to my script location
  • Then I launch powershell and call gitFetchAll.ps1 in a single command
  • Ignore the WARNING...that's an artifact of some other testing I'm doing with git (my ssh is currently broken on my laptop)
  • You can see that I then cd into each folder and call git pull.
    • ...which gets me the latest code on the remote repo if it can
    • ...or simply notifies me that the code is "Already up-to-date"

 

One last pattern:  pushd and popd

One last pattern I like to do is on Line 62 above.  The popd command.  Whenever I have a Posh script that does a "cd" I ensure that I first capture the user's pwd (print working directory) using pushd.  I then ensure that I exit the script in the same folder, using popd.  It's obnoxious to find scripts that don't do that and it makes automation routines difficult when a script has "side-effects" such as changing your working folder.  

Summary

I use this script every morning to get the latest from my remote repos.  It illustrates how to automate git on Windows using Powershell which is much less kludgey than cygwin.  The script is lacking good error-handling but I don't require it for my needs and I just wanted to demonstrate some useful git automation routines.  I also outlined two other patterns I like to use in my Posh scripts...auto-installing helper utilities if they are mising and using pushd and popd to ensure your scripts have no side-effects.  

You have just read git and Powershell Integration on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Pages

 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer