This is the second in our series on metrics. We started with a discussion on residual risk (here) and now move on to coverage.
Coverage can be measured in dozens of different ways. For example we can look at functional coverage, requirements, architecture and risks. We could also look at code coverage or database coverage. But as a testing metric they are weak. They all have the same fallacy at their core.
Test coverage metrics ignore the actual quality of what was done. Did we actually cover some feature/code or just touch it? I could cover a piece of code with one test, showing it as tested but there might be a hundred different data constellations that would be relevant to that part of the code. Each one might cause it to fail. Coverage is not guaranteed to answer those questions.
let’s take the simplest example. We have a line of code that echo/prints a string to the screen. From a coverage perspective we have a function described in the specification that requires this to happen. From a code perspective it’s a line of code that needs to be touched. So if I get the code line to execute once, I have definitely covered the coding metric and possibly the functional metric too (if that only states a valid output as success criteria).
But what about the test case with a string with 1000 characters length? That will fail because a string in some languages string variables can only take 255 characters. So the coverage metric state that we have “tested” it (see SoftEd blog). We can see though, this metric has at best a tentative link to evaluating whether something is tested. It would have told you that the line has been executed/the functionality was run at least once successfully but it would not have told you anything valuable about the actual testing (or lack of) that was done on that line of code or functionality.
Someone not familiar with the limits of this metric could deduce something, that was fundamentally wrong. A manager could think that coverage means that a sufficient amount of testing has been done. They might make deductions about risk or they could calculate expected future effort by applying speed of coverage. But coverage tells you nothing about what has or has not been tested. It only confirms whether you have ‘touched’ something.
However that in itself has value! As a tester I can use coverage to inform me if I have covered the code I was expecting to. It can show what areas of the code have not been tested at all (there will always be some of that due to error scenarios that can’t be reproduced in a test environment). Functional coverage might aid me in planning sessions for Session Based Test Management (SBTM) but using such a metric without further specification and explanation for reporting would be wrong. It should not be a metric but another tool for testers to peruse to improve their work, where it fits the context.
What I’d suggest as an alternative is to take the numbers away. When reporting on coverage it goes hand in hand with test progress. So talk about the application and the (new) functionality. Describe where the application is at and what you expect will be covered in the future. This story will inform better than a metric. It is far less open to speculation and highlights the fact that coverage and progress is not a static thing. Metrics cannot convey context and complexity.
This is the 2nd article in a series of five on test metrics. This part was written by Oliver Erlewein.
Previous Article – Residual Risk
Next Article – Defect Density
Pingback: Five Blogs – 23 May 2014 | 5blogs
Pingback: Testing Bits – 5/18/14 – 5/24/14 | Testing Curator Blog
In my earlier years in testing I would have completely supported this article, however as my experience has developed, I have learned that most projects are so restricted on deadlines that we need to be sure that we are testing effectively. Coverage of undocumented requirements is a nice to have, but often a luxury not supported by the timescales.
On (too) many occasions, i have had to restrict the testing to that which will find defects which will be addressed. I would argue that, in the example given, if there is no requirement to restrict the field to 1000, or 255 characters, or any implied requirement which can be drawn from language coverage etc. then we should not be testing the field length?
If a defect is found from the above test, what will happen with it? at best it will raise a query, which needs to be addressed by the BA (role, not person). If there is a query regarding the requirements it should not be found during testing, but during analysis, once again, leading to a requirement that can be tested, and not an assumption of an implied requirement.
I would also argue that if the test team have done there job correctly, then this metric is an accurate reflection, within acceptable limitations. The statement that should be made in the test plan is that the testing is against the documented (or otherwise) requirements, and that any given percentage of test coverage will be accurate against those requirements. If the requirements are incorrect, or incomplete, then this is not the fault of the test team.
All of that said, I completely agree that metrics out of context can be dangerous things.
I was once asked to produce a simple graph showing the status of each project in a programme of testing, the desire was to be able to show a line on the graph, under which a project could be deemed ready to go live, the graph should show all projects and the data would be produced from defect count and severity. The issue is that the defect count of any project is very rarely valid as a standalone figure, it needs to be put into the context of test progress, otherwise all projects could be deemed ready for go live before any tests have been executed (zero defects have been raised). This is why a quality gate states the percentage of test cases to be executed as well as the acceptable defect levels.
I disagree Andy as I feel that numbers can cause us to become blind. If we are told we have 100% coverage of an area, we tend to not engage and question, but instead go “ok then”. What is 100% to one person might be totally inadequate to another …
Pingback: Test Metrics Debunked – Defect Density (3/5) | Hello Test World