Ensuring Quality Database Changes
This is the fifth post in a series we’re calling CI/CD:DB. Read from the beginning.
In part 3 and part 4 of this series, we talked about defining batches of changes and then processing those batches into deployable packages of working changes that are separated from the source code repository where they were created. This mirrors the notion of Continuous Integration (CI) where we take source code changes and process them into binaries. The processing part of the CI process provides an opportunity to add quality checks into the process and improve the efficiency of the pipeline by stopping problems sooner. So, before we move on to the deployment phase for our packaged batches, we should examine what opportunities exist for us to apply CI-style quality checks to our database batches.
3 Phases of the CI process
As background, let’s think about the typical checks applied for a CI process and how they might map to a group of database changes. The parallels are easiest to see if we break the processing part of CI down into three phases — before, during, and after. Then, laying app code changes and DB changes side-by-side, we get a table like this:
|Phase||App Code Changes||DB Changes|
|Before||Do the inbound code changes meet our coding standards?||Do the inbound code changes meet our coding standards?|
|During||Does the changed code compile?||Do the changes work?|
|After||Do the compiled binaries pass a ‘smoke’ test?||Do the changes, when applied to a database, leave that database in a state where it is healthy and usable for our application?|
Using CI for quality database changes
The idea of using the CI processing step for quality is not new — it was added to the CI practice almost immediately. The concept comes directly from Lean manufacturing processes and reflects the idea that it is easier and cheaper to build something well — with quality — rather than to build something poorly and then invest in a heavy inspection and rework process.
The adage is “You cannot test quality in; you can only build quality in”. It sounds obvious that it is inefficient and ineffective to constantly rebuild and reinspect the same thing repeatedly just to get it to a deployable state, but a lot of companies do exactly that.
In addition to the overall efficiency, CI also improves the experience of an individual trying to get database changes through a pipeline. The quality automation in the CI process gives them immediate feedback on whether the change is good or bad. They can then remediate the problem without having wasted a bunch of their time or, more importantly, other people’s time.
Automation is the key
With good automation, it is relatively easy to add quality checks by plugging in additional steps or tools in any given phase. We’ve mainly focused on the basic functionality of the During phase in part 4 of this series – Automating Database Builds. Unfortunately, “Do the changes work?” is a relatively low bar for quality. It is far too easy to have unacceptable things that “work” just fine, but create security holes or performance problems in our databases. Weighed against that is the time and complexity of the checks required to ensure that our “working” changes are also “safe”. The best way to balance speed and quality is through automation — computers are simply faster than human inspection.
Setting up a database change automation strategy
The strategy for automation is straightforward. You begin in the Before phase with the easy checks — the stuff that can be done quickly and with lightweight effort. In most CI scenarios, this is one or more code scans. In this phase, you are are looking for adherence to coding standards defined by your architects, generally known anti-patterns, known security weaknesses, and any application-specific items. For example, you might scan the code going into a globalized database for non-Unicode variable declarations. These are things that involve relatively lightweight tools that are focused on text pattern-matching in the SQL itself. These are tools that are relatively lightweight, easy to integrate into a CI tool, and require simple automation to consume them.
We covered the During phase in part 4 of this series – Automating Database Builds. The idea is to verify that the proposed batch of changes actually works in a Production-representative database. This is a simple thing to automate in as much as it should be done using exactly the same deployment process as you will eventually use when deploying the batches into Production. While simple in concept as discussed in Part IV, there is more depth to this phase than just running the scripts and we will discuss that in an upcoming post.
Analyzing what the production-representative database looks like after the changes have been processed is more complex. You have to inspect the database after the changes have run (or “worked”) to ensure that the database is “healthy”. This minimally means inspecting the post-processed database and comparing it to an expected model or pattern. For example, looking for indexes on important field types, looking for orphaned objects, etc. It also means thinking about running tests against stored procedures and functions to ensure they respond as expected to some test data.
The After phase requires some thought as the checks can rapidly become very specific to a database, the application it supports, or both. The challenge with automated inspection in this phase is that it requires a strong ability to interact with the database being inspected. The capability to interact with the database, in turn, implies a much more sophisticated tool or tools to accomplish. The benefit is that these tools will catch much more subtle problems. However, they are usually more difficult to integrate into your toolchain and will require sophisticated automation to run efficiently.
Summing it up
Taking a structured approach with modular tooling will enable you to build smart quality checks into your CI process for database changes. The efficiency of your final pipeline will be defined by how well you ensure that your definition of change quality answers all three questions:
- Do the inbound code changes meet our coding standards?
- Do the changes work?
- Do the changes, when applied to a database, leave that database in a state where it is healthy and usable for our application?
It requires some initial investment to elevate your definition of quality to a higher level than ‘it works’, but the payoff in efficiency and productivity makes it worthwhile.
In our next post, we will move on past the CI process and cover the Continuous Delivery (CD) aspects of the pipeline — beginning with how you make your deployment process reliable and predictable.
Want to talk to a Liquibase expert about ensuring quality database changes for your applications? Contact us.
A Guide for Bringing Database Changes into Your CI/CD Pipeline
Automate BigQuery schema change and version control with database DevOps
Google's BigQuery is a fully managed, serverless cloud data warehouse, or database as a service (DBaaS), that brings unparalleled scalability and convenience to data analytics.