Every few months, someone asks why James Bach, Bret Pettichord, and I discuss the software testing community in terms of “schools” or suggests that the idea is misguided, arrogant, divisive, inaccurate, or otherwise A Bad Thing. The most recent discussion is happening on the software-testing list (the context-driven testing school’s list on yahoogroups.com). It’s time that I blogged a clarifying response.
Perhaps in 1993, I started noticing that many test organizations (and many test-related authors, speakers and consultants, including some friends and other colleagues I respected) relied heavily — almost exclusively — on one or two main testing techniques. In my discussions with them, they typically seemed unaware of other techniques or uninterested in them.
- For example, many testers talked about domain testing (boundary and equivalence class analysis) as the fundamental test design technique. You can generalize the method far beyond its original scope (analysis of input fields) to a strategy for reducing the infinity of potential tests to a manageably small, powerful subset via a stratified sampling strategy (see the printer testing chapter of Testing Computer Software for an example of this type of analysis). This is a compelling approach–it yields well-considered tests and enough of them to keep you busy from the start to the end of the project.
- As a competing example, many people talked about scenario testing as the fundamental approach. They saw domain tests are mechanical and relatively trivial. Scenarios went to the meaning of the specification (where there was one) and to the business value and business risk of the (commercial) application. You needed subject matter experts to do really good scenario testing. To some of my colleagues, this greater need for deep understanding of the application was proof in its own right that scenario testing was far more important than the more mechanistic-seeming domain testing.
In 1995, James Bach and I met face-to-face for the first time. (We had corresponded by email for a long time, but never met.) We ended up spending half a day in a coffee shop at the Dallas airport comparing notes on testing strategy. He too had been listing these dominating techniques and puzzling over the extent to which they seemed to individually dominate the test-design thinking of many colleagues. James was putting together a list that he would soon publish in the first draft of the Satisfice Heuristic Test Strategy Model. I was beginning to use the list for consciousness-raising in my classes and consulting, encouraging people to add one or two more techniques to their repertoire.
James and I were pleasantly shocked to discover that our lists were essentially the same. Our names for the techniques were different, but the list covered the same approaches and we had both seen the same tunnel-vision (one or more companies or groups that did reasonably good work–as measured by the quality of bugs they were finding–that relied primarily on just this one technique, for each of the techniques). I think it was in that discussion that I suggested a comparison to Thomas Kuhn’s notion of paradigms (for a summary, read this article in the Stanford Encyclopedia of Philosophy), which we studied at some length in my graduate school (McMaster University, Psychology Ph.D. program).
The essence of a paradigm is that it creates (or defines) a mainstream of thinking, providing insights and direction for future research or work. It provides a structure for deciding what is interesting, what is relevant, what is important–and implicitly, it defines limits, what is not relevant, not particularly interesting, maybe not possible or not wise. The paradigm creates a structure for solving puzzles and people who solve the puzzles seen as important in the field are highly respected. Scientific paradigms often incorporate paradigmatic cases–exemplars–especially valuable examples that serve as models for future work or molds for future thought. At that meeting in 1995, or soon after that, we concluded that the set of dominant test techniques we were looking at could be thought of as paradigms for some / many people in the field.
This idea wouldn’t make sense in a mature science, because there is one dominating paradigm that creates a common (field-wide) vocabulary, a common sense of the history of the field, and a common set of cherished exemplars. However, in less mature disciplines that have not reached consensus, fragmentation is common. Here’s how the Stanford Encyclopedia of Philosophy summarizes it:
“In the postscript to the second edition of The Structure of Scientific Revolutions Kuhn says of paradigms in this sense that they are “the most novel and least understood aspect of this book” (1962/1970a, 187). The claim that the consensus of a disciplinary matrix is primarily agreement on paradigms-as-exemplars is intended to explain the nature of normal science and the process of crisis, revolution, and renewal of normal science. It also explains the birth of a mature science. Kuhn describes an immature science, in what he sometimes calls its ‘pre-paradigm’ period, as lacking consensus. Competing schools of thought possess differing procedures, theories, even metaphysical presuppositions. Consequently there is little opportunity for collective progress. Even localized progress by a particular school is made difficult, since much intellectual energy is put into arguing over the fundamentals with other schools instead of developing a research tradition. However, progress is not impossible, and one school may make a breakthrough whereby the shared problems of the competing schools are solved in a particularly impressive fashion. This success draws away adherents from the other schools, and a widespread consensus is formed around the new puzzle-solutions.”
James and I weren’t yet drawn to the idea of well-defined schools, because we see schools as having not only a shared perspective but also a missionary character (more on that later) and what we were focused on was the shared perspective. But the idea of competing schools allowed us conceptual leeway for thinking about, and describing, what we were seeing. We talked about it in classes and small meetings and were eventually invited to present it (Paradigms of Black Box Software Testing) as a pair of keynotes at the 16th International Conference and Exposition on Testing Computer Software (1999) and then at STAR West (1999).
At this point (1999), we listed 10 key approaches to testing:
- Domain driven
- Stress driven
- Specification driven
- Risk driven
- Random / statistical
- Scenario / use case / transaction flow
- User testing
There’s a lot of awkwardness in this list, but our intent was descriptive and this is what we were seeing. When people would describe how they tested to us, their descriptions often focused on only one (or two, occasionally three) of these ten techniques, and they treated the described technique(s), or key examples of it, as guides for how to design tests in the future. We were trying to capture that.
At this point, we weren’t explicitly advocating a school of our own. We had a shared perspective, we were talking about teaching people to pick their techniques based on their project’s context and James’ Heuristic Test Strategy Model (go here for the current version) made this explicit, but we were still early in our thinking about it. In contrast, the list of dominating techniques, with minor evolution, captured a significant pattern in our observations over perhaps 7-10 of each of our years.
I don’t think that many people saw this work as divisive or offensive — some did and we got some very harsh feedback from a few people. Others were intrigued or politely bored.
Several people were confused by it, not least because the techniques on this list were far from mutually exclusive. For example, you can apply a domain-driven analysis of the variables of individual functions in an exploratory way. Is this domain testing, function testing or exploratory testing?
- One answer is “yes” — all three.
- Another answer is that it is whichever technique is dominant in the mind of the tester who is doing this testing.
- Another answer is, “Gosh, that is confusing, isn’t it? Maybe this model of “paradigms” isn’t the right subdivider of the diverging lines of testing thought.”
Over time, our thinking shifted about which answer was correct. Each of these would have been my answer — at different times. (My current answer is the third one.)
Another factor that was puzzling us was the weakness of communication among leaders in the field. At conferences, we would speak the same words but with different meanings. Even the most fundamental terms, like “test case” carried several different meanings–and we weren’t acknowledging the differences or talking about them. Many speakers would simply assume that everyone knew what term X meant, agreed with that definition, and agreed with whatever practices were impliedly good that went with those definitions. The result was that we often talked past each other at conferences, disagreeing in ways that many people in the field, especially relative newcomers, found hard to recognize or understand.
It’s easy to say that all testing involves analysis of the program, evaluation of the best types of tests to run, design of the tests, execution, skilled troubleshooting, and effective communication of the results. Analysis. Evaluation. Design. Test. Execution. Troubleshooting. Effective Communication. We all know what those words mean, right? We all know what good analysis is, right? So, basically, we all agree, right?
Well, maybe not. We can use the same words but come up with different analyses, different evaluations, different tests, different ideas about how to communicate effectively, and so on.
Should we gloss over the differences, or look for patterns in them?
James Bach, Bret Pettichord and I muddled through this as we wrote Lessons Learned in Software Testing. We brought a variety of other people into the discussions but as I vaguely recall it, the critical discussions happened in the context of the book. Bret Pettichord put the idea into a first-draft presentation for the 2003 Workshop on Teaching Software Testing. He has polished it since, but it is still very much a work in progress.
I’m still not ready to publish my version because I haven’t finished the literature review that I’d want to publish with it. We were glad to see Bret go forward with his talks, because they opened the door for peer review that provides the foundation for more polished later papers.
The idea that there can be different schools of thought in a field is hardly a new one — just check the 1,180,000 search results you get from Google or the 1,230,000 results you get from Yahoo when you search for the quoted phrase “schools of thought”.
Not everyone finds the notion of divergent schools of thought a useful heuristic–for example, read this discussion of legal schools of thought. However, in studying, teaching and researching experimental psychology, the identification of some schools was extremely useful. Not everyone belonged to one of the key competing schools. Not every piece of research was driven by a competing-school philosophy. But there were organizing clusters of ideas with charismatic advocates that guided the thinking and work of several people and generated useful results. There were debates between leaders of different schools, sometimes very sharp debates, and those debates clarified differences, points of agreement, and points of open exploration.
As I think of schools of thought, a school of testing would have several desirable characteristics:
- The members share several fundamental beliefs, broadly agree on vocabulary, and will approach similar problems in compatible ways
- members typically cite the same books or papers (or books and papers that same the same things as the ones most people cite)
- members often refer to the same stories / myths and the same justifications for their practices
- Even though there is variation from individual to individual, the thinking of the school is fairly comprehensive. It guides thinking about most areas of the job, such as:
- how to analyze a product
- what test techniques would be useful
- how to decide that X is an example of good work or an example of weak work (or not an example of either)
- how to interpret test results
- how much troubleshooting of apparent failures, why, and how much troubleshooting by the testers
- how to staff a test team
- how to train testers and what they should be trained in
- what skills (and what level of skill diversity) are valuable on the team
- how to budget, how to reach agreements with others (management, programmers) on scope of testing, goals of testing, budget, release criteria, metrics, etc.
- To my way of thinking, a school also guides thought in terms of how you should interact with peers
- what kinds of argument are polite and appropriate in criticizing others’ work
- what kinds of evidence are persuasive
- they provide forums for discussion among school members, helping individuals refine their understanding and figure out how to solve not-yet-solved puzzles
- I also see schools as proselytic
- they think their view is right
- they think you should think their view is right
- they promote their view
- I think that the public face of many schools is the face(s) of the identified leader(s). Bret’s attempts to characterize schools in terms of human representatives was intended as constructive and respectful to the schools involved.
I don’t think the testing community maps perfectly to this. For example (as the biggest example, in my view), very few people are willing to identify themselves as leaders or members of the other (other than context-driven) schools of testing. (I think Agile TDD is another school (the fifth of Bret’s four) and that there are clear thought-leaders there, but I’m not sure that they’ve embraced the idea that they are a school either.) Despite that, I think the notion of division into competing schools is a useful heuristic.
At my time of writing, I think the best breakdown of the schools is:
- Factory school: emphasis on reduction of testing tasks to routines that can be automated or delegated to cheap labor.
- Control school: emphasis on standards and processes that enforce or rely heavily on standards.
- Test-driven school: emphasis on code-focused testing by programmers.
- Analytical school: emphasis on analytical methods for assessing the quality of the software, including improvement of testability by improved precision of specifications and many types of modeling.
- Context-drive school: emphasis on adapting to the circumstances under which the product is developed and used.
I think this division helps me interpret some of what I read in articles and what I hear at conferences. I think it helps me explain–or at least rationally characterize–differences to people who I’m coaching or training, who are just becoming conscious of the professional-level discussions in the field.
Acknowledging the imperfect mapping, it’s still interesting to ask, as I read something from someone in the field, whether it fits in any of the groups I think of as a school and if so, whether it gives me insight into any of that school’s answers to the issues in the list above–and if so, whether that insight tells me more that I should think about for my own approach (along with giving me better ways to talk about or talk to people who follow the other approach).
When Bret first started giving talks about 4 schools of software testing, several people reacted negatively:
- they felt it was divisive
- they felt that it created a bad public impression because it would be better for business (for all consultants) if it looks as though we all agree on the basics and therefore we are all experts whose knowledge and experience can be trusted
- they felt that it was inappropriate because competent testers all pretty much agree on the fundamentals.
One of our colleagues (a friend) chastised us for this competing-schools analysis. I think he thought that we were saying this for marketing purposes, that we actually agreed with everyone else on the fundamentals and knew that we all agreed on the fundamentals. We assured him that we weren’t kidding. We might be wrong, but we were genuinely convinced that the field faced powerful disagreements even on the very basics. Our colleague decided to respond with a survey that asked a series of basic questions about testing. Some questions checked basic concepts. Others identified a situation and asked about the best course of action. He was able to get a fairly broad set of responses from a diverse group of people, many (most?) of them senior testers. The result was to highlight the broad disagreement in the field. He chose not to publish (I don’t think he was suppressing his results; I think it takes a lot of work to go from an informal survey to something formal enough to publish). Perhaps someone doing a M.Sc. thesis in the field would like to follow this up with a more formally controlled survey of an appropriately stratified sample across approaches to testing, experience levels, industry and perhaps geographic location. But until I see something better, what I saw in the summary of results given to me looked consistent with what I’ve been seeing in the field for 23 years–we have basic, fundamental, foundational disagreements about the nature of testing, how to do it, what it means to test, who our clients are, what our professional responsibilities are, what educational qualifications are appropriate, how to research the product, how to identify failure, how to report failure, what the value of regression testing is, how to assess the value of a test, etc., etc.
So is there anything we can learn from these differences?
- One of the enormous benefits of competing schools is that they create a dialectic.
- It is one thing for theoreticians who don’t have much influence in a debate to characterize the arguments and positions of other people. Those characterizations are descriptive, perhaps predictiive. But they don’t drive the debate. I think this is the kind of school characterization that was being attacked on Volokh’s page.
- It’s very different for someone in a long term debate to frame their position as a contrast with others and invite response. If the other side steps up, you get a debate that sharpens the distinctions, brings into greater clarity the points of agreement, and highlights the open issues that neither side is confident in. It also creates a collection of documented disagreements, documented conflicting predictions and therefore a foundation for scientific research that can influence the debate. This is what I saw in psychology (first-hand, watching leaders in the field structure their work around controversy).
- We know full well that we’re making mistakes in our characterization of the other views. We aren’t intentionally making mistakes, and we correct the ones that we finally realize are mistakes, but nevertheless, we have incorrect assumptions and conclusions within our approach to testing and in our understanding of the other folks’ approaches to testing. Everybody else in the field is making mistakes too. The stronger the debate — I don’t mean the nastier, I mean the more seriously we do it, the better we research and/or consider our responses, etc. — the more of those mistakes we’ll collectively bring to the surface and the more opportunities for common ground we will discover. That won’t necessarily lead to the One True School ever, because some of the differences relate to key human values (for example, think some of us have deep personal-values differences in our notions of the relative worths of the individual human versus the identified process ). But it might lead to the next generation of debate, where a few things are indeed accepted as foundational by everyone and the new disagreements are better informed, with a better foundation of data and insight.
- The dialectic doesn’t work very well if the other side won’t play.
- Pretending that there aren’t deep disagreements won’t make them go away
- Even if the other side won’t engage, the schools-approach creates a set of organizing heuristics. We have a vast literature. There are over 1000 theses and dissertations in the field. There are conferences, magazines, journals, lots of books, and new links (e.g. TDD) with areas of work previously considered separate. It’s not possible to work through that much material without imposing simplifying structures. The four-schools (I prefer five-schools) approach provides one useful structure. (”All models are wrong; some models are useful.” — George Box)
It’s 4:27 a.m. Reading what I’ve written above, I think I’m starting to ramble and I know that I’m running out of steam, so I’ll stop here.
Well, one last note.
It was never our intent to use the “schools” notion to demean other people in the field. What we are trying to do is to capture commonalities (of agreement and disagreement).
- For example, I think there are a lot of people who really like the idea that some group (requirements analysts, business analysts, project manager, programmers) agree on an authoritative specification, that the proper role of Testing is to translate the specification to powerful test cases, automate the tests, run them as regression tests, and report metrics based on these runs.
- Given that there are a lot of people who share this view, I’d like to be able to characterize the view and engage it, without having to address the minor variations that come up from person to person. Call the collective view whatever you want. The key issues for me are:
- Many people do hold this view
- Within the class of people who hold this view, what other views do they hold or disagree with?
- If I am mischaracterizing the view, it is better for me to lay it out in public and get corrected than push it more privately to my students and clients and not get corrected
- The fact is, certainly, that I think this view is defective. But I didn’t need to characterize it as a school to think it is often (i.e. in many contexts) a deeply misguided approach to testing. Nor do I need to set it up as a school to have leeway to publicly criticize it.
- The value is getting the clearest and most authoritatively correct expression of the school that we can, so that if/when we want to attack it or coach it into becoming something else, we have the best starting point and the best peer review process that we can hope to achieve.
Comments are welcome. Maybe we can argue this into a better article.