Measuring Database CI/CD Pipeline Performance
This is the twelfth post and final post in a series we’re calling CI/CD:DB. Read from the beginning.
This blog series has described various aspects of defining a CI/CD pipeline for managing database changes. Once you have this pipeline, the work of maintaining it and, more importantly, improving it begins. In order to do that, you have to be able to measure the pipeline’s performance. Metrics, however, are tricky things — they will impact or define how people behave. So, while it is impossible to act consistently without data, measuring the wrong things can institutionalize misbehaviors and have a serious net negative impact on the organization.
DORA to the rescue
Fortunately, a consultancy called the DevOps Research and Assessment (DORA) has provided what is, so far, the definitive set of top-level metrics for DevOps activities and CI/CD pipelines. The results of that research were published in a 2018 book called Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, Gene Kim, and Jez Humble. In the book, they define and explain how four primary measures can be used to understand how effectively an organization is delivering software. Those measures are:
- Delivery Lead Time
- Deployment Frequency
- Change Fail Rate
- Mean Time to Restore Service
The DORA set of outcome-focused metrics can be applied to the full application stack to understand the overall performance of software delivery. As with CI/CD in general, there is no reason they cannot be applied to assess how well database changes — or really changes to any part of the stack — individually perform.
Database pipeline considerations
Are there really are any special considerations necessary for the database portion of the delivery pipeline? If you have a functioning CI/CD pipeline for database changes, why wouldn’t you use the same metrics? The answer is, of course, that you should absolutely use the same metrics; you just have to think about what they mean for your pipeline and your situation. Let’s review the metrics and examine what each of them means in the context of our CI/CD pipeline concept.
Delivery Lead Time
The first metric, Delivery Lead Time, represents a simple flow metric that could be applied to nearly anything. It simply asks the average elapsed time between the time a change is defined to when it is delivered into the Production environment. With a CI/CD pipeline and automation, we know when a change was added to the pipeline and included in a batch. Either of those two points represent a logical ‘start’ point with the endpoint being when the batch completely executed into the Production environment. Whichever of those start points you choose should correspond with the one chosen by the rest of the pieces of the application stack.
Databases have a flow of changes that may move independently of business functionality changes. Consider a simple maintenance change such as adding an index to a column. How quickly the database team can respond to tuning changes is an important factor and should be part of the DB-specific use of this metric. Just be careful the stakeholders understand whether these events are included in rollup data going toward the ‘full-stack’ metrics.
The second metric, Deployment Frequency, is an easy one. How many times in a given time period do you deploy database changes into Production? The only variation on the theme here is again related to maintenance tasks. The database team needs to know the aggregate as well as the maintenance number. For example, a high ratio of maintenance changes relative to changes associated with an application change may be an indication of a deeper issue. As with Delivery Lead Time, everyone should be clear how this rolls up to the rest of the ‘full-stack’ metrics.
Change Fail Rate
Change Fail Rate is simple on the surface. A change that does not work the first time when deployed to Production represents a failure, so that is an obvious counter. The question that comes up for the database team is again related to maintenance and tuning changes. How do those count toward the total?
On the surface it would seem obvious that a tuning change that did not run when applied would count, but what about one that runs successfully but has no measurable effect? Is that a failure? There is no ‘right’ answer, but it bears some thought and it certainly must be understood by both the DB team as well as the broader set of stakeholders. The broader set of stakeholders, for example, may not care about non-event changes while the database team might be quite interested. Such a variation in perception between the DB team and the broader set of stakeholders must be carefully managed for both the health of the databases and the health of the team’s dynamics.
Mean Time to Restore Service
The final metric on the list, Mean Time to Restore Service, also brings some decisions when it is applied to a specific piece of the technology stack such as the database. This metric, unlike the other three, does not easily apply to just one area. For example, can there ever be a situation where something like the database can be compromised without bringing the rest of the application down? That is unlikely for most apps. As a result, it is unlikely that this metric will have a database-specific implementation.
Other things to measure
The DORA metrics are what are called ‘outcome’ metrics. They generally deal with the final result of a change flowing through the pipeline. They are not ‘operational’ metrics — things that you measure to affect the outcome. It is difficult to be prescriptive or specific for operational metrics as technology choices, application architecture, and business situations can affect which ones will be most influential on outcomes for any given team. So, without bias or prescription, here are some ideas of things to measure in your pipeline:
- Average age of Sub-Prod environment (between refreshes)
- Time required to refresh a Sub-Prod database
- Deployment failure rate per Sub-Prod database
- Deployment frequency per Sub-Prod database
- Change failure rate per Sub-Prod database
- Maintenance intervention rate for all databases
- % Changes/Batches delivered without DBA review/intervention
- Database batch failure rate at CI level
- % Changes/Batches delivered without revision after CI level
This list is by no means comprehensive, but hopefully, help get you started or thinking about what is most effective for your team. Do be aware that it is possible to measure too many things and waste energy by measuring irrelevant things. Be sure that what you are measuring helps the team improve the top-level outcome metrics. If it does not, you must ask yourself why you continue to measure it.
Using metrics well requires a bit of art to go with the science and math. While there are no true ‘one size fits all’ metrics when talking about delivering changes to an application system, the DORA metrics are close and are backed by very rigorous and documented research. The task for the database CI/CD pipeline owner is to adopt them and to determine what operational metrics will help their team drive the desired outcomes.