Saturday, June 6, 2020

Test patterns The "Test Data Faker" instead of "Test Data Builder"

Many engineers have seen the benefits of using builders in there Unit tests. You can find blog are articles expounding on their benefits. My personal favorite is the articles in From Test Data Builders to the identity functor by Mark Seemann. (I think if you read each one of marks articles you'll be a better person)

My thoughts on Builders
 The benefits of a builder pattern are obvious:  you'll write less code, your code will be more readable, easier to refactor and maintain, etc...  Here's two tests, they test adding a user with an Android smartphone, both tests use the exact same data every time they execute. 
Constructor InitializationTest Data Builder Pattern
[Test]
public void ContrustorTest()
{
    // arange 
    var user = new User(
        "Kimberly",
        "Kim",
        "k.kim@at.com",
        new Device(
            "Android",
            "1"),
        new Address(
            "123 Sesame Street",
            "Garbage Can",
            "Manhattan",
            "NY",
            "12345"));
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
[Test]
public void BuilderPatternTest()
{
    // arange 
    var user = new UserBuilder()
        .With(new DeviceBuilder()
            .WithOs("Android")
            .Build())
        .Build();
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
This pattern works great for Unit testing but does this pattern work well for Functional testing? In short, ya no ya, but it could be better. 
The main problem I see when using this pattern in Functional (or System) testing is that the data isn't real and it never changes. So after 20k tests have executed there will be 20k users with the first name "Kimberly"

A Fake Builder
These two tests are the same, they test adding a user with an Android smartphone, but the second test doesn't use the same data every time it executes.   
Test Data Builder Pattern Test Data Faker Pattern
[Test]
public void BuilderPatternTest()
{
    // arange 
    var user = new UserBuilder()
        .With(new DeviceBuilder()
            .WithOs("Android")
            .Build())
        .Build();
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
[Test]
public void FakerPatternTest()
{
    // arange 
    var user = new UserFaker()
        .With(new DeviceFaker()
            .WithOs("Android")
            .Fake())
        .Fake();
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
Now some will say these two test are not the same because they will use different data, the test using the faked data will have a different FirstName each time you run it and the one with the builder will always have Kimberly as the FirstName. You have to ask yourself what's the value in having 20k identical users? How often will the production environment have 20k identical users?

So whats the Differnce
In a Builder below we define the default values in the constructor making sure every time we create a new instance it has the same values (20k Kimberly's in the db). In the Faker below is using Bogus to fake the users data, each time we run this we can get unique human readable meaningful data.
 UserBuilder.csUserFaker.cs
public class UserBuilder : User
{
    public UserBuilder()
    {
        FirstName = "Kimberly";
        LastName = "Kim";
        Email = "K.Kim@earthlink.net";
        Device = new DeviceBuilder();
        Address = new AddressBuilder();
    }

    public UserBuilder WithName(string firstName, string lastName)
    {
        FirstName = firstName;
        LastName = lastName;
        return this;
    }

    public UserBuilder WithEmail(string email)
    {
        Email = email;
        return this;
    }

    public UserBuilder With(Device device)
    {
        Device = device;
        return this;
    }

    public UserBuilder With(Address address)
    {
        Address = address;
        return this;
    }

    public User Build()
    {
        return (User)this;
    }
}
public class UserFaker : User
{
    public UserFaker()
    {
        var person = new Bogus.Person();
        FirstName = person.FirstName;
        LastName = person.LastName;
        Email = person.Email;
        Device = new DeviceFaker();
        Address = new AddressFaker();
    }

    public UserFaker WithName(string firstName, string lastName)
    {
        FirstName = firstName;
        LastName = lastName;
        return this;
    }

    public UserFaker WithEmail(string email)
    {
        Email = email;
        return this;
    }

    public UserFaker With(Device device)
    {
        Device = device;
        return this;
    }

    public UserFaker With(Address address)
    {
        Address = address;
        return this;
    }

    public User Fake()
    {
        return (User)this;
    }
}
So one test will create 20k Android users that are all identical in every way, the other test will create 20k Android users that have randomized addresses, names and emails, etc. 

The Test Data Faker Pattern is basically just the Test Data Builder Pattern but except in the constructor instead of defining rigid values that never change an faking library is used. In this case faking library is called Bogus, in the above example a random "Person" is generated and the values are assigned to the "User" object, this has the added benefit of having the first and last names in the email. 

I think The Test Data Builder Pattern s perfect for Unit testing, but for Functional testing I prefer the Test Data Faker Pattern.  There is value real value in not deleting test data after execution is complete when doing Functional testing, and there is real value in having unique real world random data when testing any system. What if we need to test: pagination, search/queries, sorting, etc... 

The Test Data Faker Pattern ensures over time you'll expose potential extraneous conditions found normally: by users, during test fuzzing, in exploratory testing, in load testing, etc... Eventually your DB will have so much data you be able to find issues developers say could never happen in production (make sure to get that in writing, or place a bet with that savvy developer)  

In the "real" world
I personally don't met engineers in person that uses this pattern often, because most of the engineers I've worked with either don't test their code at all or only do a minimal amount of testing and know no testing patterns. 

I think the natural path most developers take is to just use Constructor Initialization and after they write 20 tests they notice that 10 of the test use the same object so they switch to the Object Mother pattern, now 30 days later they have a Object Mother class with 200 methods for creating a object, and if they have been extremely delegante none of the methods create duplicate object. At some point they hier a new developer and that dev said this is not maintainable we should fix this, but know one wants to break X many tests. So the option is to memorize 200 user definitions and add a new definition, this is how 200 methods quickly become 400 methods. I think this happens because people are lazy and do what's easiest in the moment and that's why they will make these typical mistakes, but the really good developers are extremely lazy, and they plan and design with good patterns so they can write twice the code in half the time like me.

In the past I would usually build my own Faker for generating test data, I find it useful to have entities in the SUT (System Under Test) data base that have realistic data like: names, address, phone numbers, account numbers, etc. Over time the system will have rich data that can be used for performance testing features that require lots of unique data like search. I no longer will build my own Faker as someone has done it better and it's a lot of work at each new gig to build and I never pu it in my open source, I use the the Bogus. Below is the UserBuilder object. In the above example the Device OS is set to Android and then then Fake() is called, any data that is null will be faked using the Bogus Faker and a User object is returned.

Imaginary Q&A:
Q: I'm really good at Unit testing, and a Unit test should only test one thing, that's the rule bro?
A: I don't care go away (PS: I don't do Unit tests, I do Functional (or System) test) 

Q: But I alway do as I'm told, and everyone always knows you have to delete all data at the end of every test or you'll die and never go to heaven?
A: That wasn't a question it was a statement, and your Unit testing dogma can't defeat me. 

Q: My mom made me wear a helmet to get the mail, and my dad never hugged me!
A: Ya You sound like you'd be great at Unit testing buddy.

Q: You said those two tests at the top are the same but they aren't because... bla bla bla
A: Sorry to interrupt you, but skip to the end, I don't care go away

If you have a problem with anything I have said, please keep in mind that I never wanted you to read this, you have violated my privacy. I wrote this for friends of the cause, someone just like you but with better hair, you have wasted your time and disappointed me and your mother. And please do something about that hair! I don't recommend this pattern for Unit testing, if you think this pattern should not be used for Unit testing please keep it to yourself!  And you can sleep well in the knowledge that I both agree and don't care, at the same time. Cheers!

Soy el rey de las pruebas funcionales

No comments: