Software performance and resilience are key components of the user experience, but as the software industry embraces DevOps, it’s starting to fall short on the performance and resilience aspects. Performance issues are often overlooked until the software fails entirely.
However, we all know that performance doesn't suddenly degrade. As software is released through iterations, there is a performance cost every time more code is added, along with additional logic loops where things can fail, affecting the overall stability.
Crippling performance or software availability issues are hardly ever due to a single code change. Instead, it’s usually death by a thousand cuts. Having rigorous practices to reinforce performance and resilience, and testing continuously for these aspects, are great ways to catch a problem before it starts. And as with many aspects of testing, the quality of the performance practice is much more important than the quantity of tests being executed.
Here are seven simple tips to drive an efficient performance and resilience engineering practice.
1. Use benchmarks and change only one variable at a time
In performance and resilience engineering, a benchmark is a standardized problem or test that serves as a basis for evaluation or comparison. We define such tests so that we can compare them to each other. In order to compare, we change one element and measure the impact of that change against another test.
During our continuous integration process, we benchmark new versions of the software to measure how the code changes impact performance and resilience of our software. In some other benchmarks, we want to measure how our software performs on different-sized hardware. As we also support multiple architectures, platforms, operating systems, databases, and file systems, we want to be able not only to define how to get the best performance and reliability, but also to compare them to one another.
These are all valid benchmark practices because we change one element and measure the impact of that change. However, if we were to change the software version under test and the hardware on which we test at the same time, and then try to compare results, we would not be able to conclude whether any change observed is due to one change, the other, or a combination of both—often, the combination of changes will have a different effect from when they happen individually.
In performance engineering, try to do "apples to apples" comparisons, use benchmarks, and change only one variable across multiple versions of the test you want to compare.
2. Monitor memory, CPU, disk, and network usage
As performance and resilience engineering is a scientific endeavor, it can only be achieved by seeking to objectively explain the events we observe in a reproducible way. That means we need to measure.
For performance engineering, we must not only measure the software we are testing, but also the hardware we are testing it on. Monitoring the memory, CPU, disk, and network usage are key for our analysis. We also must understand how those resources are allocated, as it pertains to our processing needs.
In information technology, we are always transferring data from one point to another and transforming it. Along the way we add redundancy; some of that redundancy is a waste or overhead, and some of it is necessary, as it allows us to ensure data integrity and security. Performance engineering is all about removing overhead and adding data integrity.
3. Run each test at least three times
Before we can compare test results, we need to make sure the numbers we want to compare are trustworthy. Every time we run a test, we expect that if we run the same test under the same conditions at a different time, we should get the same results and metrics.
But when we run a test for the first time, we have no history of that test under the new conditions to decide if the results we have are repeatable. Keep in mind that previous tests where one component is different cannot be taken into account for result repeatability; only the same test executed multiple times can allow us to gain confidence in our result.
Results we can trust are a key element, so I recommend that you not consider the results of a test for performance comparison unless you have executed that test at least three times. Five times is even better test hygiene. And for a release to customers or a general availability release, many more executions will be necessary.
4. Achieve a result variance under 3 percent
Still on the topic of results, we must prove that the same test repeated at different times should produce the same result. A key indicator for that is the variance (also called variability) of the primary metric. The variance is a metric that expresses the percentage difference of the best and worse execution of a same test.
Let’s consider a performance test where the primary metric is a throughput measurement in transactions a second. If we have a test with the worst execution throughput of one hundred transactions per second and the best execution throughput of one hundred ten transactions per second, our variance will be 10 percent:
(Larger value – lower value) / Lower value
(110 – 100) / 100 = 0.1
Likewise, for a resilience test where the primary metric is the recovery time in seconds, if we have a test with the worst recovery time of five minutes and the best of four minutes, our variance will be 25 percent.
The variance is the key indicator of whether our results can be trusted. A variance under 3 percent means our results are reliable. A variance between 3 percent and 5 percent means results are acceptable and repeatable, but with room for improvement regarding stability of the test, environment, or software under test. A variance between 6 percent and 10 percent means we cannot repeat our results and should actively investigate why we have such a high variance. And any test with a variance greater than 10 percent cannot be used for performance consideration at all.
5. Run your load tests for at least half an hour
Load tests are often aimed at measuring what the capacity of a system is for a specific usage. The goal is to get that system to process the largest workload in the shortest period without failing. For the measurements of such tests to have any base in reality, in my opinion, the measured performance has to be sustainable for thirty minutes at the very least.
When you think about it, the only thing you have proven with a fifteen-minute load test is that the system can handle the load fifteen minutes. Additionally, the shorter the run, the more subject to artificial variance it will be.
In performance engineering, we also need warm-up periods, because first executions are always slower on first calls. Even on a warmed-up system, the first few transactions of a test are likely to be slower and not necessarily the same between multiple runs—hence the artificial variance. On a test thirty minutes or longer, those tests will not show and are much less likely to induce variance.
If a load test duration is under thirty minutes, its results will have very little meaning from a performance engineering standpoint. Testing for at least half an hour excludes any warm-up period.
6. Prove your load results can be sustained for at least two hours
Again, I recommend half an hour at a minimum. As explained in the previous point, the only thing you have proven with a thirty-minute load test is that the system can sustain the load for thirty minutes. While thirty minutes will be enough to detect most new performance changes as they are introduced, in order to make these tests legitimate, it is necessary to also be able to prove they can run for at least two hours at the same load.
Short of running out of space, a peak load should be sustainable indefinitely. Proving the load can be run for two hours is a good first step. I recommend aiming for six, twelve, and twenty-four hours as milestones, and when possible, prove you can run these loads for five consecutive days.
Note that these endurance-under-load tests are to prove sustainability of load results. They do not need to be run against every single code change, but only to prove load numbers’ sustainability.
Start with proving two hours is sustainable. Anything less and your performance number should not be used for performance publications, and definitely not for capacity considerations.
7. Ensure you have good automation
You cannot have successful performance engineering without good automation. Do you spend more time analyzing your test results (good automation), or executing tests and making changes to existing automation (bad automation)?
If you think you can improve your automation practices, start with these seven principles:
- Know why you automate
- Understand the steps of your automation
- Don't consider only the happy path or the unhappy path
- Build blocks you can stack on top of each other
- Plan automation early
- Scenarize your automation
- Gather metrics from your automation