April 11, 2016

Newsflash: The DBA is NOT Infallible

Standard practice for updating the production database is to have a human review the proposed change and implement it.  We have done it that way for a long time.  We trust the human, or better yet the expert database administrator, to properly make the change and avoid mistakes caused by people not familiar with the specifics around a unique production database.

For example, we count on that DBA to look for errors like the benign addition of a column with a default value set.  That may appear to be an innocuous change at first glance, but when the production table row count is very large (greater than 500K, for instance), that simple mistake can cause a data manipulation language (DML) lock on the table.  That, in turn, would extend the maintenance window and impact the SLA.

We count on DBAs to make proper changes and catch errors, and most of the time they get it right. However, as Gliffy experienced recently, humans make mistakes.  On March 21, an administrator deleted the production database.  I repeat: An administrator deleted the entire production database.  It wasn’t until a very long three days later that all data was restored.  A single human error with the database resulted in the entire system being down and customers feeling the impact for three painful days.

Let that sink in for second.  The individual trusted to make the change and consider all the possible consequences of the change, made a mistake that led to three whole days of lost business, not to mention long-term impact on customer trust and relationships.

I don’t mean to kick the poor soul while he or she is down.  I’m presenting this incident as evidence of a systemic issue.  A breakdown in process and technology.  I’ve been rocking a command line since the 1980s.  In that time, I’ve seen the same pattern emerge over and over.  I call that pattern “The Hand on the Rudder.”  Actually, I’m going to start calling it an anti-pattern.

As humans, we have false confidence that since we (personally) are the ones making the changes, that process is somehow superior to a machine.  “Let’s not trust the autopilot; I’m a human.”  Thing is, unlike a machine, humans become tired, sick or hungry.  We become distracted thinking about our weekend plans or our sick child at home.  Yet time and time again, we believe that there is something inherently valuable in us pushing the “Enter” key on the keyboard.

I’d posit that we often choose to perform these tasks manually because we do not have confidence in automation.  Garbage in, garbage out.  If we can go through steps manually, we valiantly believe we can catch errors on the fly and respond appropriately.  I’m certain the administrator at Gliffy thought the same thing.  But that administrator was wrong, and it cost the company.  Big time.

If you need a more mainstream argument for trusting automation over humans, let’s consider the self-driving car.  Since 2009, Google’s Self Driving Cars (SDC) have logged 1,452,177 miles.  In that time, the fleet has experienced one lone accident while in autonomous mode.  All other accidents occurred while a human was driving the SDC.  You can read the monthly reports here: https://www.google.com/selfdrivingcar/reports/.

We’ve seen these types of repetitive tasks successfully taken over by automation systems in IT as well.  There was a time when I actually performed manual builds on my workstation and used sneaker-net to copy it to a test server using a CD-R (Kids, sneaker-net is when you walk the file over).  Since Cruisecontrol was first released, I’ve never performed a manual build.

There was a time when the “webmaster” would update webpages using Notepad and FTP them to a server.  Now we use a webhost’s admin console to make changes.

I know what you’re about to say.  Yes, we still need a person to create the build process.  We still need a human to design the webpages.  But the boring, repetitive tasks in the process are a recipe for disaster because the human brain simply was not made to perform boring, repetitive tasks.  The human mind specializes in creatively solving problems.  This is why we are the most successful species on the planet (No offense, ants…we won on quality, not quantity).  What humans fail at miserably is completing the same task over and over again with a zero failure rate.

In order to get to that zero failure rate in database change management, we need an automated tool that will allow us to rehearse changes and predict the impact.  We need the ability to restrict bad behavior and prevent DBAs from making silly, innocent but inevitable mistakes.  We need to integrate that functionality throughout our software development lifecycle effortlessly.  We need Datical DB.

Click here to learn more about Datical DB and how it is helping smart humans bring DevOps and automation to the database at some very smart companies.

Share on: