Monthly Archives: May 2020
Test Anti-Patterns: The Composite Test

Composite Test

A composite test is one that is actually testing multiple units in a single test body. It has the problem that it’s difficult to tell what parts of the test are accidental vs. intentional. This leads to breaking changes because future developers are as likely to modify the test as they are the production code.

Imagine a bank transaction cache that has three defects:

  • get with no accounts returned nothing instead of all transactions. It should return all transactions.
  • Filtering by more than one account led to duplicate transactions in the result. Each transaction should only appear once.
  • Filtering by more than one account led to a result that is not sorted from newer to older.

Consider the following test written to cover these three issues:

    #[test]
    fn by_account_does_not_duplicate_transactions() {
        let env = TestEnv::default();
        let transactions = [
            build(
                Bank::SendMoney(Data {
                    amount: 42,
                    from: account1,
                    to: account2,
                }),
            ),
            build(
                Bank::SendMoney(Data {
                    amount: 24,
                    from: account2,
                    to: account1
                }),
            ),
        ];
        env.store_transactions(transations);

        let result = env
            .service
            .get(Some(Filter {
                accounts: vec![
                    account1,
                    account2,
                ]
            }));

        assert_eq!(result.total, 2);
        assert_eq!(
            &*result
                .transactions
                .into_iter()
                .rev()
                .map(|t| Transaction::try_from(t).unwrap())
                .collect::<Vec<_>>(),
            &transactions,
        );
    }

Remember the attributes of good unit tests

  1. The test must fail reliably for the reason intended.
  2. The test must never fail for any other reason.
  3. There must be no other test that fails for this reason.

Our good unit tests guidelines ask us to write tests that fail for one reason–not multiple. They also ask us to write tests that clearly communicate what is being tested. The test name above tells us about the duplication failure, but it does not communicate that filtering failure or that ordering is an issue. Those requirements appear accidential to the design of the test. The test fails to function as a specification.

Could we get around this problem by "aliasing" the test body multiple times?

#[test]
fn all_transactions_are_returned_without_a_filter() {
    really_test_the_thing();
}

#[test]
fn transactions_are_returned_in_order() {
    really_test_the_thing();
}

fn really_test_the_thing() {
    // Real code goes here
}

If we try this approach. We do satisfy the goal of specifying the test behavior, but now any failure in the test body will cause all of our specifications to fail.

We can account the issue by following the practice of TDD which is to start with a failing test, one unit at a time. I almost always see this issue arise in test maintenance when the engineer wrote the test after the fact as opposed to practicing TDD.

Let’s begin.

1. No Account Filter

// Example in pseudo-code
#[test]
fn when_no_account_filter_is_given_then_transactions_for_all_accounts_are_returned() {
    // Given
    let env = TestEnv::default();
    let accounts = vec![porky, petunia, daffy, bugs];
    let transactions = env.generate_some_transactions_for_each_account(accounts);
    env.store_transactions(transactions);

   // When
   let result = env.cache.get(None);

   // Then
   let accounts = result.get_distinct_accounts_sorted_by_name();
   assert_eq!(accounts.sort(), expected_accounts);
}

This test communicates what’s failing. It does not rely on == a list of transactions and instead isolates the behavior under test–namely that no filters by account are applied.

2. Both Transaction Sides are Returned

If we are filtering by account, we need to make sure transations are returned whether the account is on the from or to side of the payment.

// Example in pseudo-code
#[test]
fn when_account_filter_is_specified_then_transations_to_or_from_account_are_returned() {
    // Given
    let env = TestEnv::default();
    let accounts = vec![porky, petunia, daffy, bugs];

    let transactions = [
        build(
            Bank::SendMoney(Data {
                amount: 42,
                from: bugs,
                to: daffy,
            }),
        ),
        build(
            Bank::SendMoney(Data {
                amount: 24,
                from: porky,
                to: bugs
            }),
        ),
        build(
            Bank::SendMoney(Data {
                amount: 100,
                from: porky,
                to: daffy
            })
        )
    ];
    env.store_transactions(transactions);

   // When
   let result = env.cache.get(
        Some(Filter {
            accounts: vec![
                bugs,
            ]
    }));

   // Then
   result.assert_all(|transaction| transaction.from == bugs || transaction.to == bugs);
}

3. Duplicated Transactions

One of the defects the composite test was intended to cover was that when specifying multiple accounts, transactions were duplicated in the results. Let’s test that now.

// Example in pseudo-code
#[test]
fn when_filtered_by_multiple_accounts_then_transactions_are_not_duplicated() {
    // Given
    let env = TestEnv::default();
    let accounts = vec![porky, petunia, daffy, bugs];
    let transactions = env.generate_some_transactions_for_each_account(accounts);
    env.store_transactions(transactions);

   // When
   let expected_accounts = vec![bugs, daffy];
   let filter = Filter {
       accounts: expected_accounts
   }
   let result = env.get_transactions(Some(filter));

   // Then
   let ids = result.assert_transaction_ids_are_unique();
}

4. Preserve Sorting

Sorting is different enough behavior that it deserves its own focus. I would have expected a single test on sorting for the unfiltered case. Since sorting is usually applied first this would ordinarily be enough. However, given that there was a defect specifically around sorting when filtering on multiple accounts, adding a second test case is warranted–the existence of the defect proving the need.

I’m going to extend my test helper to allow for specifying dates.

// Example in pseudo-code
#[test]
fn when_filtered_by_multiple_accounts_then_sorting_is_preserved() {
    // Given
    let env = TestEnv::default();
    let accounts = vec![porky, petunia, daffy, bugs];
    let dates = vec![NewYearsEve, ChrismasDay, ChristmasEve, NewYearsDay]);

    let transactions = [
        build(
            Bank::SendMoney(Data {
                amount: 42,
                from: bugs,
                to: daffy,
                date: ChristmasDay
            }),
        ),
        build(
            Bank::SendMoney(Data {
                amount: 24,
                from: porky,
                to: bugs,
                date: ChristmasEve
            }),
        ),
        build(
            Bank::SendMoney(Data {
                amount: 100,
                from: porky,
                to: daffy,
                date: NewYearsDay
            })
        ),
        build(
            Bank::SendMoney(Data {
                amount: 150,
                from: porky,
                to: daffy,
                date: NewYearsEve
            })
        ),

    ];
    env.store_transactions(transactions);


   // When
   let filter = Filter {
       vec![bugs, daffy],
   }
   let result = env.cache.get(Some(filter));

   // Then
   let dates = env.project_transaction_dates_without_reordering();
   // use named date constants here so that the relationship between the dates is understood at a glance.
   assert_eq!(dates, vec![NewYearsDay, NewYearsEve, ChristmasDay, ChrismasEve]);
}

Final Notes

As I worked through these examples, I started to see things that can be added to the test context and the production code to make intent clearer and even easier to test. That’s also a benefit of beginning work with the failing test: the act of writing a test that tells a story helps you understand what code you need to tell it. I often find that code I wrote to support my test ends up in my production implementation since it simplifies and/or clarifies working with the test domain.

Test Anti-Patterns: The Run-on Test

Mastering automated testing is much more challenging than people realize at first. Internet code samples are naively simple and often don’t address the issues of getting tests into place for legacy code. It can be really hard to get started with automated testing, and doing it badly can easily sour one’s judgment on the overall value. This post is the first in a series that will identify some testing anti-patterns to help you understand if you’re automated testing efforts are going down a bad path. I’ve seen these in the wild many times. Usually these test anti-patterns emerge when people do "Test After Development" (as opposed to "Test Driven Development"), or when people are new to automated testing.

Recall the guidance I wrote about Good unit tests.

In general, automated tests should be structured in a rigorous way:

#[test]
pub fn my_test() {
    // Given
    let context = SomeTestContextThatINeed();
    context.setup_with_data_and_or_expectations();

    // When
    let sut = SystemUnderTest {};
    let result = sut.do_the_action_under_test();

    // Then
    assert_eq!(result, Ok(expected_value));
}

  • The Given section should include any context setting required by the test.
  • The When section should include the operation under test. We should strive for a one-liner here as it makes it obvious at a glance what is being tested.
  • The Then section should contain the various checks on expectations.

(Some authors use a different mnemonic to represent the same thing: "Arrange, Act, Assert". This verbiage is identical to "Given, When, Then" in intent. I have come to prefer "Given, When, Then" because it’s in more common usage with project managers and is therefore less domain language to maintain.)

The Structure of a Run-On Test

Here is an example of what’s called a "run-on" test. They are often written by engineers new to unit testing or by engineers who are writing tests after the fact as opposed to practicing TDD:

#[test]
pub fn my_test() {
    // Given
    let context = SomeTestContextThatINeed();
    context.setup_with_data_and_or_expectations();

    // When
    let sut = SystemUnderTest {};
    let result = sut.do_the_action_under_test();

    // Then
    assert_eq!(result, Ok(expected_value));

    // Given
    context.alter_the_state_in_some_way();

    // When
    let new_result = sut.do_some_other_action();

    // Then
    assert_eq!(new_result, Ok(some_other_expected_value));
}

The "run-on test" follows the Then section with additional actions that require verification. This is problematic because now the test can fail for more than one reason.

A Run-On Test differs from a test with multiple assert statements. There is a separate argument about whether you should or shouldn’t have multiple assert_eq!() statements in a test body. That’s an interesting and legitimate debate, but is a conceptually distinct question from run-on tests. While these are different issues, they are often discussed together. Personally I think multiple asserts are fine so long as there are not multiple When sections in the test. In other words, feel free to assert on all the side-effects of a single action.

Question: What if we called sut.do_the_action_under_test() again instaed of sut.do_some_other_action()?

If you have altered the setup or parameters in any way then the test can fail for more than one reason.

Question: Won’t this lead to a lot of duplicated test code?

Possibly. However, you should treat your test code with the same care toward readability as your production code. Extract shared code into reusable, composable components just as you would any other code. Even if there is duplication, remember the tests are a specification so duplication between tests is not nearly as important as making the requirements clearly understood.

A More Realistic Example of a Run-On Test

Let’s imagine an engineer who writes a test after implementing some auth middlware.

#[actix_rt::test]
async fn auth_works() {
    let cert = Some("some special cert".to_string());
    let mut app = test::init_service(
        App::new()
            .wrap(crate::middleware::auth::Auth { cert })
            .configure(|c| {
                c.service(home::routes());
            }),
    )
    .await;

    let req = test::TestRequest::get().uri("/").to_request();

    let resp = test::call_service(&mut app, req).await;

    assert!(resp.stastus().is_client_error())
    assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);

    let req = test::TestRequest::get()
        .header("Authorization", cert)
        .uri("/")
        .to_request();
    let resp = test::call_service(&mut app, req).await;

    assert!(resp.status().is_success());
}

Why is this a problem?

What is the output if assert!(resp.stastus().is_client_error()) fails? Something like:

auth works failed. Expected: true Actual: false

Can you tell what requiremnt is broken from that? Even scanning the test can you tell? Can we say that a test like this is functioning as a specification? No.

There are two different assertions in the test that could lead to this output:

assert!(resp.stastus().is_client_error())
...
assert!(resp.status().is_success());

If the first assertion fails, the rest of the test does not execute. Now you don’t know if it’s that one specification that’s broken or if any of the other specs are broken too.

When you encounter this kind of test in the wild, it will likely have many more sections and assertions making it even harder to interpret than what we have here.

Recall Scott Bain’s advice about Good Unit Tests

  1. The test must fail reliably for the reason intended.
  2. The test must never fail for any other reason.
  3. There must be no other test that fails for this reason.

The test above fails 2/3 of the criteria listed here.

Question: Who cares? What is the value to be achieved with those criteria?

If the test fails reliabily for the reason intended, encountering the test failure will guide you to the problem.

If the test fails for any reason but the intended, then time spent tracking down and repairing the problem will increase dramatically.

This is a case where a little discipline goes a long way.

After Refactoring

#[actix_rt::test]
async fn auth_not_specified() {
    let cert = Some("some special cert".to_string());
    let mut app = create_web_app_using_cert(cert).await;

    let req = test::TestRequest::get().uri("/").to_request();

    let resp = test::call_service(&mut app, req).await;

    assert!(resp.stastus().is_client_error())
    assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
}

#[actix_rt::test]
async fn auth_provided_and_valid() {
    let cert = Some("some special cert".to_string());
    let mut app = create_web_app_using_cert(cert).await;

    let req = test::TestRequest::get()
        .header("Authorization", cert)
        .uri("/")
        .to_request();
    let resp = test::call_service(&mut app, req).await;

    assert!(resp.status().is_success());
}

#[actix_rt::test]
async fn auth_provided_but_invalid() {
    let cert = Some("some special cert".to_string());
    let mut app = create_web_app_using_cert(cert).await;

    let req = test::TestRequest::get()
        .header("Authorization", "some random invalid cert")
        .uri("/")
        .to_request();
    let resp = test::call_service(&mut app, req).await;

    assert!(resp.status().is_client_error())
    assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
}

Notice also how these tests clarify the specification that must be met. This is a critical point. Tests help tell the story of the code.

Did you notice the missing test in the original sample? We completely missed testing invalid auth.

Given that these test methods are well-named and small, it was easy to see that a test case is missing during the refactoring. This is one of the dangers of "Test After" as opposed to TDD: you are likely to omit important test cases. If you are disciplined about writing tests first, that won’t happen because you can’t write the production code without having a failing test in place first.

Further Reading

Kent Beck wrote a nice summary of the issues with run-on tests and what to do about them here.

Multiple Asserts are OK

Avoid Multiple Asserts in a Single Unit Test

Automated Testing Strategies & Values

Testing software is critically important to ensuring quality. Automated tests provide a lower Mean Time to Feedback (MTTF) for errors as well as enable developer’s to make changes without fear of breaking things. The earlier in the SDLC that errors can be detected and corrected, the better. (See the Test Pyramid). As engineers on the platform we should practice TDD in order to generate a thorough bed of unit tests. Unit tests alone do not ensure that everything works as expected so we will need gradually more sophisticated forms of testing.

There are different approaches to testing software. This document chooses to articulate types of automated testing by the point in the SDLC at which it is executed and by what it covers. There may be different strategies for testing at each of these lifecycle points (e.g., deterministic, fuzz, property-based, load, perf, etc..)

SDLC Stage Type Target Actors Description
Design / Build Time Unit Component Engineer, CI In process, no external resources. Mock at the Architectural boundaries but otherwise avoid mocks where possible.
Integration Component Engineer, CI These tests will mostly target the adapters for external systems (e.g., file io, databases, 3rd party API’s, 1st party API’s that are not the component under test.) Integration tests are not written against real instances of external systems beyond the control of the component in question.
Post-Deployment to Test Environment Acceptance Platform CD Largely black box, end-to-end testing. Acceptance tests will run against a live running instance of the entire system.
Operational Tests Platform CD
  • (Performance) Does the system meet its performance goals under normal circumstances?
  • (Load) What does it take to topple the system?
  • (Stress) How does the system recover from various kinds of failure?
  • other…
Manual UX Testing Platform Engineering, UX, QA, etc. This testing is qualitative and pertains to the “feel” of the platform with respect to the user experience.
Post-Production Release Smoke Platform Engineer A small suite of manual tests to validate production configuration.
Synthetic Transactcions Platform Automated Black box, end-to-end use-case testing, automated, safe for production. These tests are less about correctness and more about proving the service is running.
This list is not exhaustive, but it does represent the more common cases we will encounter.

Testing Pyramid

In general, our heaviest investment in testing should be done at the time the code is written. This means that unit tests should far outweigh other testing efforts. Why?

Unit tests are very low-cost to write and have very low Mean Time to Feedback (MTTF). This means they have the greatest ROI of any other kind of test.

This emphasis on unit testing is often represented as a pyramid

TDD

TDD is the strongly preferred manner of writing unit tests as it ensures that all code written is necessary (required by a test) and correct. Engineers who are not used to writing code in a TDD style often struggle with the practice in the early stages. If this describes your experience, be satisfied with writing tests for the code you’ve written in the same commit.

The activity of TDD consists of three steps:

  1. (RED) Write a failing unit test.
  2. (GREEN) Write enough production code to make it pass.
  3. (REFACTOR) Now make the code pretty.

The unit tests you write should strive to obey the three laws of TDD:

  1. Don’t write any production code unless it is to make a failing unit test pass.
  2. Don’t write any more of a unit test than is sufficient to fail; and compilation failures are failures.
  3. Don’t write any more production code than is sufficient to pass the one failing unit test.

Good unit tests have the following attributes:

  1. The test must fail reliably for the reason intended.
  2. The test must never fail for any other reason.
  3. There must be no other test that fails for this reason.

These are ideals and practicing TDD this way is often difficult for newcomers to the practice. If this describes you then try scaling back to submitting the unit tests in the same commit as your production code. Don’t forget to commit early and often!

Further Reading

It’s impossible to fully convey the scope of what you should know about test automation in this document. Below are some resources you may be interested in as you move through your career.

  1. Test Driven Development: By Example by Kent Beck
  2. The Art of Unit Testing: 2nd Edition by Roy Osherove
  3. Working Effectively With Legacy Code by Michael Feathers
  4. Refactoring: Improving the Design of Existing Code (2nd Edition) by Martin Fowler
  5. Performance vs. Load vs. Stress Testing