Quality Software Derived through Incremental Changes
By Niel Nickolaisen, CTO, O.C. Tanner
I started my career outside of IT and was something of a lean-manufacturing nerd before transitioning and becoming an IT nerd. One of my favorite mottos from lean manufacturing is: “I don’t take junk. I don’t make junk. I don’t pass junk along.” I recall this motto whenever I think about software testing because, for way too long, the attitude towards software testing was for software engineers to – unintentionally – take, make, or pass along junk (in the form of bugs) and then hope that someone else (usually the QA team) would find and fix the junk. There are a number of issues with this approach. First, any rework (and fixing mistakes is rework) is much worse than getting it right the first time. Second, the longer it takes to find and fix the bugs, the higher the cost and more negative the impact. Third, we improve our processes and practices if we have really short feedback loops and, for a software engineer, the feedback loop on the quality of the code is at least twice as long if that feedback comes from QA rather than from the software engineer. This is all prologue to what we have done over the past five years to improve our software testing model. Our new model is based on a couple of key ideas. First, software engineers are responsible for the quality of the code they write. That means they are also responsible for testing their own code (to keep the feedback loop short and the learning and improvement high). This leads to the second key idea—the QA team uses their expertise to define the processes and tools the software engineers will use to ensure quality code. The QA team sets up the automated testing tools, defines the unit and system testing standards and assists the software engineers as the software engineers are responsible to not take junk; not make junk and never pass along junk.
2. What do you think are the biggest challenges that technologists face in working in a more agile and outcomes based model?
In my experience, the biggest challenges are those that come from changing attitudes, behaviors and practices around agile. For example, a couple of years ago a software company asked me to assess their agile development practices. They were not getting the results they needed. I spent a number of hours with the teams as we followed and mapped out their processes. They were using agile methods and breaking the work into time-boxed sprints.
We embedded the QA teams in the software teams and had the QA members of the teams focus on the process and provide the engineers the tools they needed to test their own code
In each sprint, the engineers wrote code, threw the code “over the wall” to the QA team and then worked on their next stories. The QA team did their testing but when they found bugs the engineers were doing other work that they did not want to interrupt by fixing the bugs and so the bugs tended to pool up. To resolve this issue, they fell into a cycle in which one sprint was used to create new code and another sprint to fix bugs from previous sprints—this not only lengthened the feedback loop between bug creation and bug fix (which also delayed the learning that happens with short feedback loops) but also resulted in a high level of churn from task switching. The engineers would have to recall the stories they were working on the past sprint, become familiar with the code and then think through the integration points. All of this took much more time and effort than integrating the coding and testing activities. Our solution was to do just that—integrate the process. We embedded the QA teams in the software teams and had the QA members of the teams focus on the process and provide the engineers the tools they needed to test their own code.
We have taken a similar approach with our teams. We have integrated teams (software, QA, design and product management) and have placed a lot of emphasis on test automation so that it is simple and easy for the software engineers to test their own code, find their own bugs and fix their own code in near real-time. For this to work, we needed to shift from manual to automated testing and in order to be effective at automated testing, we had to have written test cases we could load into the testing tools. Early in our agile sprints, the QA members of the team assist by moving the test cases into the testing tools.
This brings up one more potential issue. For portions of our legacy code, we do not have written test cases. That put us in the dilemma of wanting to use automated testing but not being willing to allocate large portions of the team to write test cases for hundreds of thousands of lines of code. In our case, we assessed our legacy code and identified large swaths of that code that we agreed we would never change. In effect, we built a fence around that code and declared it off limits except for vital, must-do changes. We did a comprehensive system test to make sure that the off-limits code was performant. As long as we do not make any changes to that legacy code we treat any tests of that code as being passed. We write test cases for any code we write around the legacy code and use automated testing tools for that and thus avoid having to stop work on everything else for months or years to write legacy test cases.
3. What set of skills do you think is required for the technology leaders to be successful in the new approach to software testing?
I think more than skills in things like test automation, an IT leader today needs to have a desire to find and resolve root cause and a continuous improvement mindset. In my example above, the teams tried to solve the code quality problem by creating their testing-only sprints. However, they did not find and understand the root cause of the problem before implementing their solution. Thus, their solution did not make really make things better and only shifted things to a different problem. Getting to and resolving root cause requires understanding the entire process, defining the process goals and thinking through the cause / effect relationships among the process steps. Then, rather than trying to fix everything in one fell swoop, we identify the highest value process change, make that change, validate that it made things better and then move to the next improvement. In many cases, it has taken us lots of time to dig the software quality hole we are in. We should expect that it will take us time to dig back out.