Back in the day, quality was synonymous with testing — how many tests were run, how many passed, and how many failed. The quality of a feature being released was determined by these failures, and by the number of issues that the customer reported.
Cut short to the world of software today — we have moved to the world of services. A multitude of these services interact with other services in complex ecosystems, with third-party integrations, accessing and manipulating terabytes of data at near real-time speeds. Do the features work? — is no longer a sufficient metric to track the quality of a product. The questions about quality have gotten more diverse and complex, spanning multiple tools, teams, and initiatives:
Does the feature work? How many test cases passed or failed?
How much time did the operations team spend in deploying and maintaining the new service?
How does the feature impact security posture for the product?
Do we have code hotspots that generate a lot of bugs?
Is the product maintaining its SLAs and SLOs?
How quickly can someone debug and fix a P0 issue for this feature?
What is my deployment cadence and how is it affecting incoming defect rates?
To sufficiently answer the quality question, today you need data from multiple engineering tools — Jira, SCM, PagerDuty, Security Tools, and CI/CD tools to name a few. You need to be able to successfully co-relate metrics and draw insights on the success of your quality program.
In my previous roles, as I helped build and scale quality and performance teams for a rapidly growing SaaS product, along with regression results and test coverage, there were a few software metrics that helped me track and answer quality questions around the product:
Change Failure Rate
In a SaaS world, deployments are frequent. Unfortunately, so are customer-impacting outages. As we scaled teams, it became critical to find a balance between how much and how often we deploy vs how stable the product is. By tracking deployment frequency and deployment sizes against incoming outages and customer issues, we were able to make data-driven decisions on how often we deploy. Transitioning from weekly to bi-weekly deployments immediately improved the stability, gave features more time to soak in staging, and reduced outages
We all have parts of the code that are complex, critical or just simply magical. Any changes to these areas of code require rigorous and careful testing and even an altered deployment plan to cover for rollbacks. We had a 2 fold problem here — we did not fully know where these hotspots were and we did not know if any of those hotspots were touched in the upcoming deployment. By correlating incoming issues to the areas of code where the fixes went in, we were able to get a rudimentary map of hotspots, review the release change list against the map and determine the risk factor for a deployment.
Release Security Metrics
It is a no-brainer that security is a critical aspect of every SaaS product. But with rapid development and deployment, we put security testing into the final steps of release planning. It was a hard lesson learnt when after a feature was developed and tested, we realized that we had overlooked some security aspects and had to go back to the drawing board. Incorporating security tools into the CICD pipeline and tracking vulnerabilities and best practice violations on the release development dashboard saved some unpleasant surprises downstream
How are quality metrics changing in your engineering environments?
Interested in learning more about how Harness Software Engineering Insights can help improve your engineering outputs? Request a demo.