July 29, 2020

Trustworthy Database Environments

This is the seventh post in a series we’re calling CI/CD:DB. Read from the beginning.

In Part 6 of this blog series, we dealt with ensuring a consistent process for delivering changes into a target environment. Great processes delivering changes we know are good to a target environment are crucial. Even so, they cannot completely guarantee a perfect deployment. In order to ensure a perfect deployment to production, we must make sure that the environments against which the batches are verified and in which the processes are rehearsed are trustworthy.

If environments are not trustworthy, they can become a point of inconsistency when inspecting the results of the batch and process. That inconsistency brings doubt which means that it requires excessive time and effort to verify things like the following:

  • Making sure the only changes present are the expected ones
  • Making sure that the batches being applied were the sole source of the changes appearing in the database
  • Making sure nothing is missing from the database

Since CI and CD are intended to accelerate throughput by removing friction and enhancing efficiency, spending extra time on routine changes because an environment is not trustworthy is a problem. It would be regarded as a source of waste in the pipeline and a threat to the quality of the overall system.

Trustworthy database environments

So, what do we mean by ‘trustworthy’? To begin, the dictionary definition cites that it is an adjective meaning “deserving of trust or confidence; dependable; reliable”. Translating that to a database in a validation (pre-production) environment means that three basic things must be true:

  1. The state of the database must be precisely knowable BEFORE new changes are applied.
  2. There are no sources of change to the database outside of the inbound batch.
  3. The state of the database is guaranteed to be truly representative of what Production will look like at the same point/version.

If these points are always true, then we can minimize or eliminate environment issues when inspecting the database after the batch has been applied.

Know your state

There are two parts to making sure you always know the state of the database you are working with.

  • Ensuring that the schema version is easily knowable by both people and automation scripts
  • Having the ability to reset the database to a known state at will

Both people and automation scripts know the state

The first part is ensuring that the state — effectively its schema version — is easily knowable by both people and automation scripts. This is a shorthand structure to identify which batches are already present, if any changes need to be backfilled to bring the database to a current version, and so on. The identifier exists because it is impractical to always completely scan a database. It takes too long and scans can be fooled by environment-specific variations. This is why migration-based tools such as Liquibase maintain change tracking tables in target databases; they have an instant and ever-present way to reconcile inbound changes against the list of changes already in the database.

Reset at will

Once the basic version tracking is handled, the second part is the ability to reset the database to a known state at will. The obvious benefit is the ability to quickly get back to a known state in the event of a problem, but that is only the most basic reason. The more advanced reason is to use a good resetting capability proactively, not just reactively, to avoid a whole class of problems in the first place. Proactive resets that happen frequently are great for work that happens in the validation stage because they ensure that you are always starting from a known state for every deployment. You can minimize or eliminate the chance of something ‘leftover’ in an environment tripping you up. This is why so many DevOps practices leverage fast resets of their entire environments. Some go so far as completely replacing the environment for each test cycle.

One path for changes

Once you have constant visibility into the condition of your databases, you must address the issue of having only one path for changes to get into the database. Per above, you cannot truly know the state of an environment if someone or something can invisibly make changes to it. These are called ‘out-of-band’ changes. Depending on your technical capabilities, there are generally two patterns for dealing with this.

Path 1: If fast refreshes are not possible

The first pattern is if you do not, or cannot, have fast refreshes to your environment. In this case, you must restrict access to the validation environments so that only the automation tools can actually log in and make changes except in very exceptional cases. The analogy used here is that of breaking the glass on a fire alarm. It is permitted if there is a true need, but if you do choose to do so, everyone is going to know about it. For teams that use this approach, if an event occurs, there is a process for retrieving credentials from a password vault where no human has invisible access to them. Once the event is over, the credentials are changed and everything resets to as it was beforehand where only the automation tools — now configured to be aware of whatever change was made — are the only things interacting directly with the databases.

Path 2: If you have advanced reset capabilities

The second, more ideal pattern, is for when you do have advanced reset capabilities. This pattern prefers that only tools can access environments, but adds the ability to automatically get clean snapshots of the authoritative schema without data. Authoritative schema snapshots are usually based on Production, which is generally well secured so the chances of unexpected changes start low. Then, as long as you can ensure the integrity of Production, you can have the ability to leverage an automated process to generate snapshots of Production at will. Further, automation can enforce a checksum-like capability to ensure that the snapshots do not change once generated. With this pattern, you can leverage the proactive reset approach described in the state discussion above to ensure that even if changes do come in from another source, they are obliterated in the reset event.

Note: The idealized version of the second pattern — ephemeral, single-use environments — can be difficult to establish when you first start your journey. It is probable that you will need a combination of the first and second patterns for some time as you build up to having a reset capability. Plan accordingly.

Validation databases must be representative of production

The third dimension of trustworthiness is that your validation databases must be truly representative of Production at a known point of time. Any database that is being used to validate changes must be accurately representative of production. If it is not, it has no serious value as a test environment. The important thing to remember is that, in test environments, the databases represent Production at a point in time in the future. They represent production after some number of batches of changes have been applied, but before the latest batch of changes has been applied. This aspect of production representation is why the first two dimensions — tracking state and limiting change paths — are so important.

Test Data Management

The first two points, however, do not really address data. And, while Test Data Management (TDM) is arguably a separate discipline to CI/CD, it is required if we expect our test results to be predictive of what will actually happen in Production. Furthermore, automated test suites used in CI/CD rely on good test data. So, while TDM itself is well beyond our discussion here, we do need to address what we should expect from it to have a trustworthy test database.

There are two primary items for TDM within CI/CD environment management. First, the datasets need to be the smallest required to accomplish the testing task and no larger. This makes it easier and faster to reset, thus supporting the overall initiative.

The second is that it must be secure and compliant with all privacy standards. This point is more than the typical security reminder. If there are snapshots and copies of data around, the security risk goes up accordingly. So, keeping the data masked, encrypted, etc. is an important consideration lest our use of that data in a ‘lower’ and potentially less secure environment becomes a breach.

Wrapping up

Having validation environments you can trust is crucial to having a low-friction, automated CI/CD process. No matter how well the changes and deployment processes are defined and executed, if the target environments are unstable, the results will be unpredictable. Being able to track and control changes into a Production-representative environment reduces friction and time spent on flowing changes through the pipeline. It also helps people have more faith in better-automated testing.

We have now discussed creating batches of changes, ensuring that they are ready, and reliably deploying those batches. This covers the basic activities required for a CI/CD flow. With that out of the way, we are going to shift focus — starting with the next post on Sandbox Databases — for optimization and management activities around CI/CD.

Share on: