DaveWentzel.com            All Things Data

One Last Git Gotcha: The Merge Pattern

In general I love git for all of the reasons I've outlined in the previous posts in this series.  There are some gotchas that can be confusing for developers moving from a centralized VCS.  None of these are a huge deal.  So far.  Today I'll post some cases where we've experienced disasters using git and some lessons learned that may be valuable for others.  Sometimes the strengths of a product ("distributed" source control and merge workflows) can also be its weaknesses if guardrails are not put in place.  

Have an authoritative repo

The distributive nature of git is generally viewed as a blessing.  A developer can pull and commit to any of many possible repos in an organization and in theory will "eventually" have a complete copy of the source code.  In reality I've found that doesn't work.  It is necessary to have an "authoritative repo" that has controlled access and is the basis for official builds of release(d) software.  The authoritative repo (ar) should only allow pushes from a limited number of people to ensure we don't accidentally merge bad code into it that is difficult to undo.  

Consider disallowing git push --force

To start with, never allow your ar to receive a git push --force.  The --force overwrites the structure and sequence of commits and will greatly increase the chance that previous commits are lost.  By default git allows --force but can be turned off with git config --system receive.denyNonFastForwards true.  An ar should always deny this.  

It might be valuable to allow this on other repos however.  On local repos there are good use cases for this.  But on any repo where others have had the ability to run a git pull/fetch you WILL cause grief for those developers if you do a git push --force.  Effectively their subsequent git push will have a different HEAD and cause confusion and possibly lost commits.  

Noobs can bad screw up merges if they don't understand git

Assume the following scenario:  

  1. git pull to "get latest" (let's assume you are now at "Version 123")
  2. work and commit changes to FileA to the local repo for the next few days.  
  3. You are ready to integrate your work into the public repo.  You do a
    1. git pull (assume the public repo is at Version 130), which attempts to merge any other changes from Version 124-130 into the local repo for FileA, FileB, and FileC.  
    2. It takes you a few hours to handle the merge conflicts in FileA...FileB and FileC you have not changed so you safely IGNORE those changes.  You are now ready to git push.   
  4. Do another git pull (public repo at Version 131), which attempts another merge.  
  5. git commit
  6. git push

This is known as the "merge workflow" and it is the most popular way to use git in a multi-committer environment.  

At Step 5 you will see a git message that says FileA is conflicted and FileB and FileC are changed.  

If we were using svn we would perform a svn commit on FileA ONLY since we made no changes to B and C...those were made by others.  This is because svn is a "file-based VCS".  git is a "snapshot-based" VCS so we want to commit A,B,C even though we didn't change B or C.  Why?  Because we wish to commit the state of the snapshot and don't concern ourselves with the fact that we are committing files we didn't change.  

That's counter-intuitive to svn experts.  If you use gitgui or git bash this is all done for you and difficult to screw up.  You simply see staged changes that you commit...there is no option to "remove" a change from the changeset.  But if you use other gui tools you may see options to remove changes you haven't made...just like you see in svn.  Resist the urge to overthink this.  Just commit and push the snapshot as it exists as you tested.  In svn it is common for a developer to scan all files being committed and NOT commit them because they were already committed and because the developer knows he didn't change them. If you do that in git you are effectively hitting the UNDO button on other changes between Version 123 and current for files you have not changed.  

The "lost" changes still exist in git, just in a different commit that you'll need to re-merge...usually after you notice bugs and missing features after a few builds.  This behavior is totally different from any other VCS I've ever worked with including ClearCase, VSS, TFS, SVN, or Perforce.  Perhaps Mercurial or Bazaar or other VCSs I'm not aware of work similar to git...but this causes a lot of grief for devs that try to make git work like svn.  

Solutions to overcome this merge problem:

  • Education works best.  Tell your devs to not overthink it.  Push the snapshot that you tested locally.  Let git manage what to push.  We also run a CIT process on the non-authoritative repos and check for discarded merges (by running unit tests) before we allow pushes to the ar.  
  • git pull often.  Very often.
  • Try to avoid having multiple devs working on the same files/sections of code.  Easier said than done. 
  • Avoid merging, which is contrary to git best practices I know.  Instead, use rebase as much as possible.  Specifically, we ensure all commits are rebased locally so new pushed commits are always on top of other devs' commits on HEAD.  Merging is much reduced then and all of the smaller, local commits are not propograted to the public repo which adds additional confusion for people scanning the git logs (which is why I like git squash which is a type of rebase actually).  
  • Do what github does.  Limit who may commit to an ar.  Here is the github workflow...called the "gatekeeper model" or "maintainer model":
    • only a "maintainer" may push to the important branches on the ar.  The maintainer should be intimately aware of the source code AND the idiosyncracies of git merging (such as the issue I've seen above).
    • users clone the ar at anytime and can push to the non-ar at will.  When all tests pass on the non-ar...
    • the dev issues a "pull request" to the maintainer.  The maintainer pulls the changes into the ar from the dev's repo (or, likely, the non-ar).  
    • The maintainer can review the pull request and ensure that the merge is clean.  If not the maintainer can fix it, or more likely send it back to the dev for rework.  
    • Now if anything breaks in the ar it is the maintainer's problem...not a git noob that just discarded a bunch of commits.   

If you ever have a symptom where it appears as though git "lost your work" it is probably due to discarded merges.  

General rule with git:  If you scratch your head wondering why git is trying to merge and push more files than you know you've edited you should probably juist push the files anyway!  That seems counter-intuitive but it works if you remember that git is snapshot-based vs file-based.  (Perhaps the better rule is to seek help from others first).  

Said differently, when you do a git pull on top of your work you MUST then push those changes back up or they are undone/lost.  

You have just read "One Last Git Gotcha:  The Merge Pattern" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.