Showing posts with label TAD. Show all posts
Showing posts with label TAD. Show all posts

Why Test-After Coverage Is Lower



Unit testing after the fact? We do just enough test-after development (TAD) to please our manager, and no more.

Just Ship It! Once the business realizes they have product in hand, there's always pressure to move on to the next feature.
Invariably, any "after the fact" process is given short shrift. This is why post-development code reviews don't reveal deep problems, and even if they do, we rarely go back and make the costly changes needed to truly fix the problems. This is why after-the-fact testers are always getting squeezed out of the narrow space between "done" software and delivery. 

My Sh*t Don't Stink. I just wrote my code, and it works. I know so because I deployed it, cranked up the server, and ran through the application to verify it. Why would I waste any more time building unit tests that I hope to never look at again?

That Can't Possibly Break. Well, two-line methods and such. I can glance at those and think they look good. Lazy initialization? That couldn't break, could it?

That's Gonna Be a Pain. Holy crud, that's a 100-line method with 3+ levels of nesting, complex conditionals, and exceptional conditions (test those? you're kidding). It's going to take me longer to write a reasonably comprehensive set of tests than it took to write the method in the first place.

Worse, the method has numerous private dependencies on code that makes database or service calls. Or just dependencies that have nothing to do with what I'm testing. Just today I tried to instantiate a class, but failed because of a class two inheritance levels up with a dependency on another object being properly instantiated and configured. Figuring out how to test that is going to be a nightmare that eats up a lot of time.

Code written without immediate and constant consideration for how to unit test it is going to be a lot harder to test. Most of us punt when efforts become too daunting.
---
I hear that TAD coverage typically gets up to about 75% on average. Closer inspection reveals that this number is usually all over the map: 82% in one class, 38% in another, and so on. Even closer inspection reveals that classes with a coverage percent of, say, 38, often contain the riskiest (most complex) code. Why? Because they're the hardest to test.

If I was allowed to only do TAD and not TDD, I'd scrap it and invest more in end-to-end testing.

-- Jeff

Premature Passes: Why You Might Be Getting Green on Red



Red, green, refactor. The first step in the test-driven development (TDD) cycle is to ensure that your newly-written test fails before you try to write the code to make it pass. But why expend the effort and waste the time to run the tests? If you're following TDD, you write each new test for code that doesn't yet exist, and so it shouldn't pass.

But reality says it will happen--you will undoubtedly get a green bar when you expect a red bar from time to time. (We call this occurrence a premature pass.) Understanding one of the many reasons why you got a premature pass might help save you precious time.
  • Running the wrong tests. This smack-your-forehead event occurs when you think you were including your new test in the run, but were not, for one of myriad reasons. Maybe you forgot to compile it, link in the new test, ran the wrong suite, disabled the new test, filtered it out, or coded it improperly so that the tool didn't recognize it as a legitimate test. Suggestion: Always know your current test count, and ensure that your new test causes it to increment.
  • Testing the wrong code. You might have a premature pass for some of the same reasons as "running the wrong tests," such as failure to compile (in which case the "wrong code" that you're running is the last compiled version). Perhaps the build failed and you thought it passed, or your classpath is picking up a different version. More insidiously, if you're mucking with test doubles, your test might not be exercising the class implementation that you think it is (polymorphism can be a tricky beast). Suggestion: Throw an exception as the first line of code you think you're hitting, and re-run the tests.
  • Unfortunate test specification. Sometimes you mistakenly assert the wrong thing, and it happens to match what the system currently does. I recently coded an assertTrue where I meant assertFalse, and spent a few minutes scratching my head when the test passed. Suggestion: Re-read (or have someone else read) your test to ensure it specifies the proper behavior.
  • Invalid assumptions about the system. If you get a premature pass, you know your test is recognized and it's exercising the right code, and you've re-read the test... perhaps the behavior already exists in the system. Your test assumed that the behavior wasn't in the system, and following the process of TDD proved your assumption wrong. Suggestion: Stop and analyze your system, perhaps adding characterization tests, to fully understand how it behaves.
  • Suboptimal test order. As you are test-driving a solution, you're attempting to take the smallest possible incremental steps to grow behavior. Sometimes you'll choose a less-than-optimal sequence. You subsequently get a premature pass because the prior implementation unavoidably grew out a more robust solution than desired. Suggestions: Consider starting over and seeking a different sequence with smaller increments. Try to apply Uncle Bob's Transformation Priority Premise (TPP).
  • Linked production code. If you are attempting to devise an API to be consumed by multiple clients, you'll often introduce convenience methods such as isEmpty (which inquires about the size to determine its answer). These convenience methods necessarily duplicate code. If you try to assert against isEmpty every time you assert against size, you'll get premature passes. Suggestions: Create tests that document the link from the convenience method to the core functionality, demonstrating them. Or combine the related assertions into a single custom assertion (or helper method).
  • Overcoding. A different form of "invalid assumptions about the system," you overcode when you supply more of an implementation than necessary while test-driving. This is a hard lesson of TDD--to supply no more code or data structure than necessary when getting a test to pass. Suggestion: Hard lessons are best learned with dramatic solutions. Discard your bloated solution and try again. It'll be better, we promise.
  • Testing for confidence. On occasion, you'll know when you think a test will generate a premature pass. There's nothing wrong with writing a couple additional tests: "I wonder if it works for this edge case," particularly if those tests give you confidence, but technically you have stepped outside the realm of TDD and moved into the realm of TAD (test-after development). Suggestions: Don't hesitate to write more tests to give you confidence, but you should generally have a good idea of whether they will pass or fail before you run them.
Two key things to remember:
  • Never skip running the tests to ensure you get a red bar.
  • Pause and think any time you get a premature pass.

Seven Steps to Great Unit Test Names





You can find many good blog posts on what to name your tests. We present instead an appropriate strategy for when and how to think about test naming.
  1. Don't sweat the initial name. A bit of thought about what you're testing is essential, but don't expend much time on the name yet. Type in a name, quickly. Use AAA or Given-When-Then to help derive one. It might be terrible--we've named tests "DoesSomething" before we knew exactly what they needed to accomplish. We've also written extensively long test names to capture a spewn-out train of thought. No worries--you'll revisit the name soon enough.
  2. Write the test. As you design the test, you'll figure out precisely what the test needs to do. You pretty much have to, otherwise you aren't getting past this step! :-) When the test fails, look at the combination of the fixture name, test method name, and assertion message. These three should (eventually) uniquely and clearly describe the intent of the test. Make any obvious corrections, like removing redundancy or improving the assertion message. Don't agonize about the name yet; it's still early in the process.
  3. Get it to pass. Focus on simply getting the test to pass. This is not the time to worry about the test name. If you have to wait any significant time for your test run, start thinking about a more appropriate name for the test (see step 4).
  4. Rename based on content. Once a test works, you must revisit its name. Re-read the test. Now that you know what it does, you should find it much easier to come up with a concise name. If you had an overly verbose test name, you should be able to eliminate some noise words by using more abstract or simpler terms. You may need to look at other tests or talk to someone to make sure you're using appropriate terms from the domain language.
  5. Rename based on a holistic fixture view. In Eclipse, for example, you can do a ctrl-O to bring up an outline view showing the names for all related tests. However you review all the test names, make sure your new test's name is consistent with the others. The test is a member of a collection, so consider the collection as a system of names.
  6. Rename and reorganize other tests as appropriate. Often you'll question the names of the other tests. Take a few moments to improve them, with particular focus given to the impact of the new test's name. You might also recognize the need to split the current fixture into multiple fixtures.
  7. Reconsider the name with each revisit. Unit tests can act as great living documentation -- but only if intentionally written as such. Try to use the tests as your first and best understanding of how a class behaves. The first thing you should do when challenged with a code change is read the related tests. The second thing you should do is rename any unclear test names.
The test names you choose may seem wonderful and clear to you, but you know what you intended when you wrote them. They might not be nearly as meaningful to someone who wasn't involved with the initial test-writing effort. Make sure you have some form of review to vet the test names. An uninvolved developer should be able to understand the test as a stand-alone artifact - not having to consult with the test's author (you). If pair programming, it's still wise to get a third set of eyes on the test names before integrating.

Unit tests require a significant investment of effort, but renaming a test is cheap and safe. Don’t resist incrementally driving toward the best name possible. Continuous renaming of tests is an easy way of helping ensure that your investment will return appropriate value.

TDD Process Smells


This list of "process smells" focuses on execution of the practice of test-driven development (TDD)--not on what the individual tests look like. There are no doubt dozens of similar smells; following the rule of 7 (+/- 2), I've chosen the smells I see most frequently.
  • Using code coverage as a goal. If you practice test-driven development, you should be getting close to 100% coverage on new code without even looking at a coverage tool. Existing code, that's another story. How do we shape up a system with low coverage? Insisting solely on a coverage number can lead to a worse situation: Coverage comes up quickly by virtue of lots of poorly-factored tests; changes to the system break lots of tests simultaneously; some tests remain broken, destroying most of the real value in having an automated test suite.
  • No green bar in the last ~10 minutes. One of the more common mis-interpretations of TDD is around test size. The goal is to take the shortest step that will generate actionable feedback. Average cycle times of ten minutes or more suggest that you're not learning what it takes to incrementally grow a solution. If you do hit ten minutes, learn to stop, revert to the last green bar, and start over, taking smaller steps.
  • Not failing first. Observing negative feedback affirms that any assumptions you've made are correct. One of the best ways to waste time is skip getting red bars with each TDD cycle. I've encountered numerous cases where developers ran tests under a continual green bar, yet meanwhile their code was absolutely broken. Sometimes it's as dumb as running tests against the wrong thing in Eclipse.
  • Not spending comparable amounts of time on refactoring step. If you spend five minutes on writing production code, you should spend several minutes refactoring. Even if your changes are "perfect," take the opportunity to look at the periphery and clean up a couple other things.
  • Skipping something too easy (or too hard) to test. "That's just a simple getter, never mind." Or, "that's an extremely difficult algorithm, I have no idea how to test it, I'll just give up." Simple things often mask problems; maybe that's not just a "simple getter" but a flawed attempt at lazy initialization. And difficult code is often where most of the problems really are; what value is there in only testing the things that are easy to test? Changes are most costly in complex areas; we look for tests to clamp down on the system and help keep its maintenance costs reasonable.
  • Organizing tests around methods, not behavior. This is a rampant problem with developers first practicing TDD. They'll write a single testForSomeMethod, provide a bit of context, and assert something. Later they'll add to that same test code that represents calling someMethod with different data. Of course a comment will explain the new circumstance. This introduces risk of unintentional dependencies between the cases; it also makes things harder to understand and maintain.
  • Not writing the tests first! By definition, that's not TDD, yet novice practitioners easily revert to the old habit of writing production code without a failing test. So what if they do? Take a look at Why TAD Sucks for some reasons why you want to write tests first.

F.I.R.S.T



Source: Brett Schuchert, Tim Ottinger

In my Object Mentor days Brett and I were looking at ways to improve some class materials on the topic of unit testing. I noticed that our list of properties almost spelled FIRST. We fixed it.

We refer to these as the FIRST principles now. You will find these principles detailed in chapter 9 of Clean Code (page 132). Brett and I have a different remembrance of the meaning of the letter I and so I present for your pleasure the FIRST principles as I remember them (bub!). He had it right the first time so we will cut him some slack.

The concepts are very simple, and of course achieving them once you've gone off the track can be very hard. Always better to start with rigor here, and maintain it as you go.

Fast: Tests must be fast. If you hesitate to run the tests after a simple one-liner change, your tests are far too slow. Make the tests so fast you don't have to consider them.

A test that takes a second or more is not a fast test, but an impossibly slow test.

A test that takes a half-second or quarter-second is not a fast test. It is a painfully slow test.

If the test itself is fast, but the setup and tear down together might span an eighth of a second, a quarter second, or even more, then you don't have a fast test. You have a ludicrously slow test.

Fast means fast.

A software project will eventually have tens of thousands of unit tests, and team members need to run them all every minute or so without guilt. You do the math.

Isolated: Tests isolate failures. A developer should never have to reverse-engineer tests or the code being tested to know what went wrong. Each test class name and test method name with the text of the assertion should state exactly what is wrong and where. If a test does not isolate failures, it is best to replace that test with smaller, more-specific tests.

A good unit test has a laser-tight focus on a single effect or decision in the system under test. And that system under test tends to be a single part of a single method on a single class (hence "unit").

Tests must not have any order-of-run dependency. They should pass or fail the same way in suite or when run individually. Each suite should be re-runnable (every minute or so) even if tests are renamed or reordered randomly. Good tests interferes with no other tests in any way. They impose their initial state without aid from other tests. They clean up after themselves.

Repeatable: Tests must be able to be run repeatedly without intervention. They must not depend upon any assumed initial state, they must not leave any residue behind that would prevent them from being re-run. This is particularly important when one considers resources outside of the program's memory, like databases and files and shared memory segments.

Repeatable tests do not depend on external services or resources that might not always be available. They run whether or not the network is up, and whether or not they are in the development server's network environment. Unit tests do not test external systems.

Self-validating: Tests are pass-fail. No agency must examine the results to determine if they are valid and reasonable. Authors avoid over-specification so that peripheral changes do not affect the ability of assertions to determine whether tests pass or fail.

Timely: Tests are written at the right time, immediately before the code that makes the tests pass. While it seems reasonable to take a more existential stance, that it does not matter when they're written as long as they are written, but this is wrong. Writing the test first makes a difference.

Testing post-facto requires developers to have the fortitude to refactor working code until they have a battery of tests that fulfill these FIRST principles. Most will take the expensive shortcut of writing fewer, fatter tests. Such large tests are not fast, have poor fault isolation, require great effort to make them repeatable, tend to require external validation. Testing provides less value at higher costs. Eventually developers feel guilty about how much time they're spending "polishing" code that is "finished" and can be easily convinced to abandon the effort.

Why POUT (aka TAD) Sucks

Tim and I are no strangers to controversy and debate. As agile coaches, we get challenged all the time. "TDD is a waste of time, it doesn't catch any defects, you don't want to go past 70% coverage," and so on. They sound like excuses to us. But we're patient, and perfectly willing to have rational discussions about concerns with TDD. And we're the first to admit that TDD is not a panacea.


Still, we haven't yet seen a better developer technique than TDD for shaping a quality design and sustaining constant, reasonable-cost change over the lifetime of a product. Yet: If you show us a better way, we'll start shouting it across the rooftops. We're not married to TDD, it just happens to be what we find the most effective and enjoyable way to code.


Many of the challenges to TDD come from the crowd that says, "write the code, then come back and write some unit tests." This is known as POUT (Plain Ol' Unit Testing), or what I prefer to call TAD (Test-After Development). The TAD proponents contend that its practical limit of about 70% coverage is good enough, and that there's little reason to write tests first a la TDD.


70% coverage is good enough for what? If you view unit testing as a means of identifying defects, perhaps it is good enough. After all, the other tests you have (acceptance and other integration-oriented tests) should help catch problems in the other 30%. But if you view tests instead as "things that enable your code to stay clean," i.e. as things that give you the confidence to refactor, then you realize that almost a third of your system isn't covered. That third of your system will become rigid and thus degrade fare more rapidly in quality over time. We've also found that it's often the more complex (and thus fragile) third of your system!


And why only 70%? On the surface, there should be little difference. Why would writing tests after generate any different result from writing tests first? "Untestable design" is one aspect of the answer, and "human nature" represents the bulk of the second part of the answer. Check out the card (front and back).