Release Risk Assessment for QA Teams

Effective Data-driven Approach

Introduction

Software Quality Assurance (QA) teams within an Enterprise perform a critical role in accelerating the software lifecycle. QA is typically performed in multiple stages and by separate teams (Figure 1). In each of these stages, QA team’s primary function is to ensure the quality of a software release and determine the effectiveness to promote the new release to the next stage. They also play a key role in determining and communicating the confidence and risks of a release to the organization’s stakeholders such as development, deployment and product/business owner (Figure 1).

Figure 1: QA Teams Plays a Central Role in a Software Lifecycle

QA team's primary tasks every day are the following:

  1. Validate and approve builds with low risk to the next test stage or for deployment
  2. Identify quality issues with the build
  3. Provide risk assessment to business owner to meet deadlines

These tasks are time-consuming, costly and in many cases manual and error prone. With the advent of continuous integration and continuous delivery requirements in enterprises along with the application architecture shifts to microservices, these tasks are becoming even harder.

Challenges Faced by the QA Teams

Let’s review the challenges faced by the QA teams today in accomplishing their tasks.

  • Validate and approve builds with low risk to the next test stage or for deployment Primary challenge faced by QA in order to validate and approve low risk builds to the next test stage or to production is the time constraint to meet the release deadline. Software release cycle is usually planned with enough time for the required various QA stage testing. Unfortunately, the reality is different from the planned time frames (Figure 2). Either development completion is delayed, or there is a higher urgency to deploy the release quicker for business reasons.

    Figure 2: Compressed QA time in a Typical Release Cycle

    Given these practical constraints, the challenge is for QA teams to determine the confidence of a given build/release as well as the risk of shortening test cycles. The QA team do not have a reliable way to prioritize the testing of services that are more at risk compared to other safer services in that release.This leads to a best guess estimate of the release quality before promotion to the next test stage or production which is not ideal.

  • Identify quality issues with the build Next challenge faced by the QA teams is to identify issues in the release along the risk dimensions of architectural regressions, performance, scalability and security. However, these issues are not found using traditional automated functional testing. Hence, these issues tend to show up in the production deployments which inherently results in higher cost to fix. To Identify and debug these issues, QA teams faces an unmanageable and error prone task to evaluate hundreds of metrics per service for each build for a complex distributed or microservices application. Once the issues are identified, QA teams still lack the ways to accurately identify root cause component responsible for the heightened risks in the build. Today, such level of root cause debugging comes with a huge cost of time from both the QA teams and the development teams.
  • Provide risk assessment to business owner to meet deadlines QA teams face a challenge to communicate to the business owner (product management or VP of Engineering) about the risks of promoting a release to production to meet a shortened deadline. They lack the ability to provide quantifiable data to justify the release decision besides the number of open issues and its severity. This presents a huge problem for the business owner who has to make a business impacting decision, and promoting a bad release could have serious consequences. But this uncertainty is mostly repeated for every single release.

Data-driven Approach to Release Risk Assessment

An effective data-driven approach is needed to address these major challenges faced by the QA teams. The OpsMx platform provides a solution for QA teams to be effective in understanding the risks of a given release and the confidence needed to promote the release to the next test stage or deployment. The OpsMx solution approach is shown in Figure 3:

Figure 3; OpsMx Data-driven Approach

Step 1 - Observe: OpsMx non-intrusively observe new build of a service or entire application through various metrics, logs and interactions during the integration test, load stress test and staging environments. OpsMx can leverage existing monitoring tools or runs its agent to gather the needed data points.

Step 2 - Analyze and Machine Learn: OpsMx uses expert level algorithms and machine learning to characterize new build of a service including the service’s interactions with other services. OpsMx additionally leverages known service specific templates and user specified domain rules to enhance the characterization of the application further.

Step 3 - Risk Assessment and Diagnostics Report: In the final step, OpsMx provides the risk assessment and diagnostics report for the build. The risk assessment report includes the safety score and the detailed diagnostics about the build. The safety scores for a given build signifies the readiness and suitability of that new build evaluated for architectural regressions, performance, scalability, configuration and security issues. The detailed diagnostics part of the report clearly identify safety divergences of the various components of the build and flag any significant deviations as shown in Figure 4.

Figure 4: Risk Assessment and Diagnostics Report - Safety Score and Detailed Diagnostics

Addressing the Challenges of QA Teams

Let’s understand how OpsMx addresses the challenges faced by the QA teams with this above approach.

  • Validate and approve builds with low risk to next test stage or for deployment With the OpsMx build risk assessment report for each stage, QA teams have a definite report on safety and readiness of the build as soon as the stage is complete. If the safety score is above the pass threshold, the QA team can promote the build to the next stage either manually or in an automated fashion. With OpsMx integrated into all test stages, QA teams will be able to validate the overall build to understand the risk for deployment.
  • Identify quality issues with the build OpsMx risk assessment report provides very detailed sub-score for components of each build across various metrics group. If there are any significant deviation or issues found between the current build and the baseline version, OpsMx automatically flags the issue and provides root cause analysis. Issue identification and root cause identification is thus fully automated enabling QA team to ensure the quality of the release and save time in the process. It also allows for better communication and collaboration between the development and QA teams to reduce the MTTR for the issues found.
  • Provide risk assessment to business owner to meet deadlines OpsMx risk assessment report clearly identifies the current build safety compared to the baseline or production release. It is also possible to set thresholds based on business requirements to fail a particular build if it falls below the expectations. These abilities allow the QA team to clearly articulate to the business owners whether it is safe to deploy a build to meet release deadlines. Since the risk assessment is deeply data-driven, it reduces human judgment and error and provides increased confidence for the decisions by the business owner.

The OpsMx solution seamlessly integrates into existing CI/CD framework (Jenkins, Spinnaker, etc.), test frameworks (JMeter, Selenium, LoadRunner, Visual Studio, etc.), APM/log management and developer tools on any infrastructure for wide flexibility as shown in Figure 5.

Figure 5: OpsMx Integration into Software Development and Release Eco-system Tools.

Summary

The OpsMx provides an effective data-driven solution for software release risk assessment for QA teams in an enterprise. OpsMx solution can integrate with each test stage of typical QA cycle for a build. With the OpsMx solution, QA team can validate and approve builds with low risk to next test stage or for deployment, identify quality issues with the build and provide risk assessment to the business owner to meet release deadlines.

For more information about the OpsMx solution for QA teams, visit www.opsmx.com or email to info@opsmx.com