Saturday, June 6, 2020

Test patterns The "Test Data Faker" instead of "Test Data Builder"

Many engineers have seen the benefits of using builders in there Unit tests. You can find blog are articles expounding on their benefits. My personal favorite is the articles in From Test Data Builders to the identity functor by Mark Seemann. (I think if you read each one of marks articles you'll be a better person)

My thoughts on Builders
 The benefits of a builder pattern are obvious:  you'll write less code, your code will be more readable, easier to refactor and maintain, etc...  Here's two tests, they test adding a user with an Android smartphone, both tests use the exact same data every time they execute. 
Constructor InitializationTest Data Builder Pattern
[Test]
public void ContrustorTest()
{
    // arange 
    var user = new User(
        "Kimberly",
        "Kim",
        "k.kim@at.com",
        new Device(
            "Android",
            "1"),
        new Address(
            "123 Sesame Street",
            "Garbage Can",
            "Manhattan",
            "NY",
            "12345"));
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
[Test]
public void BuilderPatternTest()
{
    // arange 
    var user = new UserBuilder()
        .With(new DeviceBuilder()
            .WithOs("Android")
            .Build())
        .Build();
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
This pattern works great for Unit testing but does this pattern work well for Functional testing? In short, ya no ya, but it could be better. 
The main problem I see when using this pattern in Functional (or System) testing is that the data isn't real and it never changes. So after 20k tests have executed there will be 20k users with the first name "Kimberly"

A Fake Builder
These two tests are the same, they test adding a user with an Android smartphone, but the second test doesn't use the same data every time it executes.   
Test Data Builder Pattern Test Data Faker Pattern
[Test]
public void BuilderPatternTest()
{
    // arange 
    var user = new UserBuilder()
        .With(new DeviceBuilder()
            .WithOs("Android")
            .Build())
        .Build();
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
[Test]
public void FakerPatternTest()
{
    // arange 
    var user = new UserFaker()
        .With(new DeviceFaker()
            .WithOs("Android")
            .Fake())
        .Fake();
    // act
    var service = new UserService();
    var userResponse = service.AddUser(user);
    // assert
    Assert.NotNull(userResponse.Id);
}
Now some will say these two test are not the same because they will use different data, the test using the faked data will have a different FirstName each time you run it and the one with the builder will always have Kimberly as the FirstName. You have to ask yourself what's the value in having 20k identical users? How often will the production environment have 20k identical users?

So whats the Differnce
In a Builder below we define the default values in the constructor making sure every time we create a new instance it has the same values (20k Kimberly's in the db). In the Faker below is using Bogus to fake the users data, each time we run this we can get unique human readable meaningful data.
 UserBuilder.csUserFaker.cs
public class UserBuilder : User
{
    public UserBuilder()
    {
        FirstName = "Kimberly";
        LastName = "Kim";
        Email = "K.Kim@earthlink.net";
        Device = new DeviceBuilder();
        Address = new AddressBuilder();
    }

    public UserBuilder WithName(string firstName, string lastName)
    {
        FirstName = firstName;
        LastName = lastName;
        return this;
    }

    public UserBuilder WithEmail(string email)
    {
        Email = email;
        return this;
    }

    public UserBuilder With(Device device)
    {
        Device = device;
        return this;
    }

    public UserBuilder With(Address address)
    {
        Address = address;
        return this;
    }

    public User Build()
    {
        return (User)this;
    }
}
public class UserFaker : User
{
    public UserFaker()
    {
        var person = new Bogus.Person();
        FirstName = person.FirstName;
        LastName = person.LastName;
        Email = person.Email;
        Device = new DeviceFaker();
        Address = new AddressFaker();
    }

    public UserFaker WithName(string firstName, string lastName)
    {
        FirstName = firstName;
        LastName = lastName;
        return this;
    }

    public UserFaker WithEmail(string email)
    {
        Email = email;
        return this;
    }

    public UserFaker With(Device device)
    {
        Device = device;
        return this;
    }

    public UserFaker With(Address address)
    {
        Address = address;
        return this;
    }

    public User Fake()
    {
        return (User)this;
    }
}
So one test will create 20k Android users that are all identical in every way, the other test will create 20k Android users that have randomized addresses, names and emails, etc. 

The Test Data Faker Pattern is basically just the Test Data Builder Pattern but except in the constructor instead of defining rigid values that never change an faking library is used. In this case faking library is called Bogus, in the above example a random "Person" is generated and the values are assigned to the "User" object, this has the added benefit of having the first and last names in the email. 

I think The Test Data Builder Pattern s perfect for Unit testing, but for Functional testing I prefer the Test Data Faker Pattern.  There is value real value in not deleting test data after execution is complete when doing Functional testing, and there is real value in having unique real world random data when testing any system. What if we need to test: pagination, search/queries, sorting, etc... 

The Test Data Faker Pattern ensures over time you'll expose potential extraneous conditions found normally: by users, during test fuzzing, in exploratory testing, in load testing, etc... Eventually your DB will have so much data you be able to find issues developers say could never happen in production (make sure to get that in writing, or place a bet with that savvy developer)  

In the "real" world
I personally don't met engineers in person that uses this pattern often, because most of the engineers I've worked with either don't test their code at all or only do a minimal amount of testing and know no testing patterns. 

I think the natural path most developers take is to just use Constructor Initialization and after they write 20 tests they notice that 10 of the test use the same object so they switch to the Object Mother pattern, now 30 days later they have a Object Mother class with 200 methods for creating a object, and if they have been extremely delegante none of the methods create duplicate object. At some point they hier a new developer and that dev said this is not maintainable we should fix this, but know one wants to break X many tests. So the option is to memorize 200 user definitions and add a new definition, this is how 200 methods quickly become 400 methods. I think this happens because people are lazy and do what's easiest in the moment and that's why they will make these typical mistakes, but the really good developers are extremely lazy, and they plan and design with good patterns so they can write twice the code in half the time like me.

In the past I would usually build my own Faker for generating test data, I find it useful to have entities in the SUT (System Under Test) data base that have realistic data like: names, address, phone numbers, account numbers, etc. Over time the system will have rich data that can be used for performance testing features that require lots of unique data like search. I no longer will build my own Faker as someone has done it better and it's a lot of work at each new gig to build and I never pu it in my open source, I use the the Bogus. Below is the UserBuilder object. In the above example the Device OS is set to Android and then then Fake() is called, any data that is null will be faked using the Bogus Faker and a User object is returned.

Imaginary Q&A:
Q: I'm really good at Unit testing, and a Unit test should only test one thing, that's the rule bro?
A: I don't care go away (PS: I don't do Unit tests, I do Functional (or System) test) 

Q: But I alway do as I'm told, and everyone always knows you have to delete all data at the end of every test or you'll die and never go to heaven?
A: That wasn't a question it was a statement, and your Unit testing dogma can't defeat me. 

Q: My mom made me wear a helmet to get the mail, and my dad never hugged me!
A: Ya You sound like you'd be great at Unit testing buddy.

Q: You said those two tests at the top are the same but they aren't because... bla bla bla
A: Sorry to interrupt you, but skip to the end, I don't care go away

If you have a problem with anything I have said, please keep in mind that I never wanted you to read this, you have violated my privacy. I wrote this for friends of the cause, someone just like you but with better hair, you have wasted your time and disappointed me and your mother. And please do something about that hair! I don't recommend this pattern for Unit testing, if you think this pattern should not be used for Unit testing please keep it to yourself!  And you can sleep well in the knowledge that I both agree and don't care, at the same time. Cheers!

Soy el rey de las pruebas funcionales

Monday, November 12, 2018

Number one complaint I hear about Chocolatey and why its no big deal.

The number one complaint hear from people when I pitch them Chocolatey is:  "How do I know that what I'm installing is safe?" Let's look at why people are asking this. First, Chocolatey is downloading packages that contain installers and powershell scripts and then executing them. Second, anyone can submit a package. It's easy to see why a person might not be comfortable with this scenario. Lets keep in mind that Chocolatey uses moderation and packages are virus scanned when uploaded, but according to there docs virus scanning at runtime and during install is for (paid versions) licensed editions only. Some people I have talked to are still sceptical about using it even with virus scanning. This really boils down to wanting perfect control, or not trusting the moderation.

There is an easy way to make Chocolatey just as secure as your current manual or automated process. If you create your own packages and internally host your own Chocolatey server you will be no less secure than your current process. Creating you own packages will allow you to point to versions of installers your team has downloaded directly from the vendors. You can also control what powershell is running during install. Hosting the your own Chocolatey server is as easy as setting up a Nuget server.  

After you set up Setup a Chocolatey server, the steps to create your local and "safe" packages are.
  1. Download the package from Chocolatey.org
  2. Modify the package to point to your local binaries
  3. Verify powershell doesn't do anything you want
  4. publish the package to your local Chocolatey Server
In most cases you'll see that your really just downloading a package from Chocolatey, modifying the url to point to your local installer, then publishing it locally. It's so quick and you'll have exactly what you what perfect control. 

I would argue that you don't need perfect control and you can trust moderation. I would still set up a local Chocolatey server, but not for security. Having a local Chocolatey server allows you to make packages that are customized and faster downloads. 

The number two complaint: "so why the dumb name?", some people don't get the reference right away.