Metrics and the desire to measure things (especially in software testing) is often used and abused. The craft is rife with misplaced, misunderstood, and at times dangerous measures. In particular, a recent post entitled “5 examples of metrics that matter” goes some way to support fallacies in the software testing space (http://blog.softed.com/2014/04/28/5-examples-of-metrics-that-matter/).
What follows is a series of five explanations as to why these metrics miss their mark.
Lets look at the first *metric*…
1. Residual Risk. This is all about what do we know about (i.e. parts of the solution/risk profile that have been covered, tested) and what do we not know about. Residual risk is a great quality and progress metric as it can tell us about gaps in knowledge and coverage.
Residual risk – what does this mean? According to the Oxford dictionary, residual can mean “remaining after the greater part or quantity has gone” which we could view as the predominant understanding of the word residual. Also according to the same dictionary, risk could mean the “…possibility that something unpleasant or unwelcome will happen”. So if we put this together, we have residual risk meaning something unpleasant that is left over after the greater portion of the unpleasant X have been resolved.
Well, according to the Oxford dictionary it could also mean something not accounted for or eliminated due to the unfortunate consequence of an action. This could mean that depending on one’s perspective, residual risk is passive in the sense that we are dealing with *stuff left over* or active as in risk that was not accounted for due to the consequence of an action. So while we could discuss residual risk as a *count* of identified *risks* left over from some list, what does that actually tell us?
Lets look a little further into why residual risk may seem to be a great metric.
The post says “this is all about what do we know about (i.e. parts of the solution/risk profile that have been covered, tested) and what do we not know about.”
In my experience, a number of organisations look at the test cases created, traced against a requirement, usually with the sentiment that M test cases cover N number of requirements and thereby create the illusion that covering requirements with test cases equals effective risk mitigation.
This is a fallacy – the idea that test cases tagged against a requirement equals effective coverage, from which risks are then highlighted is, at the very least, shallow. In the testing space, when we talk about risk the reference is generally risk against process. How often has your project been threatened or has been through an audit? As soon as the risk of an audit (excuse the pun) appears, mounds of paperwork are created to perpetuate the myth that good process equates to low product risk. This activity is the equivalent of breaking starch and does not address fully the risks that could harm a project.
So while residual risk may identify some risks that could harm *us*, the way this metric is often used is too shallow. Which leads to the next question – how do we make this better?
First of all lets define what we mean by risk and not assume that everyone has the same understanding. For example, your project may define risk as “the potential that something bad will occur that may lead to the minister/CEO/public/stakeholders/audit becoming directly involved.” This sets the term of reference and allows the team to come back to what is defined as a risk (as opposed to just labelling everything as a risk without being clear what that means in your context).
Secondly, lets look at the different types or categories of risk. Is it a risk to the product such as an insecure login screen allowing the system to be compromised? Is it a project risk such as having the entire test team taken out with a strain of the H1N1 virus? What are the risk areas (e.g. performance – maybe your app requires *good* performance outside of the main city centres)? All in all, what we need to understand is, what is the purpose behind the measure?
Once we begin to understand the measurement’s purpose, we can then start categorising the different risk areas and types which facilitates the ability to focus a testers thinking. This focusing ideally allows for idea generation on grouping, testing, and mitigating the identified risks (including understanding the social dynamic of the risks. Risks very rarely live without having an impact on someone or something).
So, when we discuss the metric residual risk, there is much more involved than counting what identified number of N’s are left over. Effective residual risk factors in the definition of risk, the purpose of reporting on residual risk, what information we are capturing, how it is arranged and displayed and the social dynamics associated with it. When combined, it helps us become more informed about the risks and helps with the appropriate decision making. Therefore, I contend that there will ALWAYS be unknown risks and therefore the metric, not matter how well defined, will ALWAYS be fallible.
This is the 1st part of a series of five metrics posts that will be discussed. It was written by Brian Osman.
Next Article – Coverage
Pingback: Five Blogs – 16 May 2014 | 5blogs
Pingback: Testing Bits – 5/11/14 – 5/17/14 | Testing Curator Blog
If you read through Sharon’s comments, it becomes apparent that her metric of “Residual Risk” isn’t actually a metric at all. Specifically, Sharon stated “Hi Joe, I tend not to use numbers in residual risk[…]”, which is directly in opposition to it being a metric (“A system or standard of measurement”, or “[A] measure of some property of a piece of software or its specifications[…]”) – hard to be a metric, when a metric is a measurement, and it’s hard to measure when you don’t have numbers.
That said, the inherent problem comes down to that you don’t know what you don’t know. Sharon is able to discuss and possibly measure what she knows she doesn’t know, but not very easily what she doesn’t know she doesn’t know. But, if she’s using this less as a metric and more as a discussion point (again, see her more recent comments) then this is less of a problem as it can be presented in less of an absolute.
Apologies if this doubles up, previous comment was eaten by a grue. Here’s an abridged version, assuming it doesn’t come through:
If you read Sharon’s comments, it becomes apparent that this isn’t actually a metric. She talks about how she doesn’t use measurements for Residual Risk; as the definition of a metric involves using numbers as measurement, this cannot be a metric she’s actually talking about…she’s using it as a discussion point, which makes this less troublesome (assuming it’s presented as “what we know we don’t know”, rather than “this is everything we don’t know” – it’s impossible to know what you don’t know).
Totally agree that metrics used wrongly are very dangerous. A few points though. Residual Risk is about risk, not requirements coverage. Covering the requirements and counting test cases are not good metrics at all. Risk, however, is very important for testers to consider and be aware of and to discuss. Finding a common language for that discussion is vital and this is a start point. No metric should be used in isolation, all metrics should be used with knowledge and to help grow understanding and provide relevant information. I have found residual risk to be a useful part of the metrics tool box.
Looking forward to your next group of posts!
Oops – Sorry meant to address the author, Oliver, not Brian.
Brian is the author. Little WordPress kink there. Changed now.
Pingback: Test Metrics Debunked – Coverage (2/5) | Hello Test World
Pingback: Test Metrics Debunked – Defect Density (3/5) | Hello Test World