Wednesday, February 26, 2014

These Aren’t The API Models You’re Looking For

I recently answered a question on Stack Overflow that started me thinking about the differences between domain models and the models you expose through your Web API. While the question was only tangentially related, I thought it would be helpful to think through why I believe that domain models shouldn’t be exposed directly through your API.

I’ve long argued for model-per-view in MVC and I’ve carried that thinking over to WebAPI as well. In an MVC context, I think that separation of concerns is the strongest argument in favor of using separate view models. Views have specific data needs for display that don’t belong in your domain models. For example, they may contain auxiliary information containing localization or paging. Your views may not need or you may not desire your view models (which can also used as the arguments for POST actions) to contain all domain properties – for example, whether a user is in a role. Or you may expose this as part of the model but validate it in a completely different way, leading to differing validation requirements. Lastly, your views may flatten multiple models into a single, composite model, omitting unimportant details from that particular view’s point of view.

Many or most of these also apply to API models, even though they are or seem to be strictly about the data. While I don't claim the following to be an exhaustive list, here the reasons I think that you should use separate models for your API and map between your domain and your API models rather than directly expose your domain models.

Hiding Implementation Details

There are two reasons why you might want to hide the implementation details of your model. As in MVC, there are likely some record-keeping attributes on your domain models that are not necessary for the proper operation of the API. For example, you may track who created the item and when, who updated it and when - in a HIPPA world you may even track who viewed it, when, and, perhaps, for what purpose. You may also have actual domain information that needs to remain private even if some of the information is exposed; or it may only conditionally be available based on the API consumer's privileges. This, in itself, is probably enough of a reason to separate your API models from your domain models. Doing this allows you to keep your domain model simpler and cleaner, more appropriate to the data rather than the usage of the data. The alternative would, perhaps, be a multiplication of domain models, representing purpose-specific views and an accompanying complication of your API (or additional APIs) to support those models.

Transforming Implementation Details

Similarly, there are aspects of your model to represent relationships in the data that may be irrelevant to the API consumer. You may model a relationship as many-to-many in your domain model. For example, a product may have many accessories while an accessory may be available for more than one product. From a catalog management perspective this makes perfect sense, but for an API that allows an external consumer to access your catalog you may only want to expose it as a one-to-many relationship. This product has these accessories and not expose that the accessories belong to more than one product. You may not even want to model the reverse relationship from accessory to product in the API at all. It might simply be immaterial.

Also you may want to provide localization for product attributes so that your API can be language aware. While your domain model must account for the relationships to alternative text for attributes, you're likely to perform substitution for those attributes appropriate to the language requested rather than provide all translations, forcing the consumer to pick the correct one for their purpose.

Navigation

Navigation in your domain model is established through foreign key relationships in the database, which are translated to object references and collections of object references in your object model. Navigation in APIs is represented by URLs. "RESTful", "REST-like", "Hypermedia" - that's a discussion for a different post - but modern thinking is that your API on the web should leverage web protocols (the verbs and request patterns). This can be a challenge in designing your API as you trade off between model purity and efficiency.

In your object model the cost of navigation is very low (an object reference) but on the web it is exponentially higher - a full web request. The thing that's clear, though, unless your data set is very small, is that you will have to make compromises to keep your requests from ballooning in size. While you may be able to keep your entire product catalog in memory, it's not likely that you're going to want to serialize all of it and deliver it in a single web request. Instead, you're going to break your API up into multiple methods, each perhaps corresponding to a aggregate root. You may choose to expose navigation between related elements via explicit navigation properties or simply ids that the client uses in constucting the appropriate URLs to request more information.

If you embrace web protocols in your API, you'll need to account for the differences in navigation properties between your object model and your API.

Versioning

Another strong argument in favor of using separate view models is versioning. Because APIs represent a "public" product, tying disparate systems together, it's rarely possible to simply discard an older API when the underlying domain model is modified. In response API designers need to account for this by introducing some sort of versioning into their API design. If you directly expose your domain models, this could also mean that you would need to introduce versioning into your domain model to support older versions of an API. Decoupling your domain models from your API through the use of purpose-specific API models insulates your domain model from this need. While you may need to main multiple versions of your API models to support multiple versions of your API, your domain model can evolve to support new features without compromise for the sake of the API.

Summary

In short, you should use separate API models. Yes, it might seem simpler to directly use your domain models and there are a lot of examples that take this expedient. In the long run and with a more complex API than typically used for tutorials, separate API models are going to serve you well.

Saturday, February 22, 2014

Unit testing JsonResult with Nested Anonymous Types

Recently I had a need to test some controller actions that returned nested anonymous types. Since the Data property on the JsonResult is of type object this can involve jumping through some hoops with reflection to get access to the properties in the anonymous types.  My first attempt was to simply serialize the object, then deserialize it to an anonymous type. This worked, but I found that my nested types were JObjects and I needed to use cast and convert these to the actual type.
public class TestContext
{
    // remainder omitted
    
    public dynamic ToDynamicObject(object obj, object anonymousTemplate)
    {
        var serialized = JsonConvert.SerializeObject(obj);
        return JsonConvert.DeserializeAnonymousType(serialized, anonymousTemplate);
    }
}
Used as
var model = _c.ToDynamicObject(result.Data, new { success = true, data = new SubModel() });

var subModel = ((JObject)model.data).ToObject<SubModel>();
I found one solution at http://blog.m0sa.net/2011/02/mvc-unit-testing-jsonresult-actions.html which seemed to improve upon this by using a wrapper class that implements DynamicObject. This got me very close to what I wanted except, again, the nested anonymous objects themselves had to be wrapped individually. Here using my initial implementation of an extension method that used it.
var model = result.Data.AsDynamic();

var cardModel = ((object)model.data).AsDynamic() as SubModel;
To fix this I added a bit of code to the wrapper class so that the result of TryGetMember was an anonymous type, it created and returned a DynamicObject wrapper around it. To test if the type was anonymous I used the code found at http://www.liensberger.it/web/blog/?p=191 (referenced from this Stack Overflow question, http://stackoverflow.com/questions/2483023/how-to-test-if-a-type-is-anonymous)

Here’s the final implementation as extensions to object.  Note: these are internal to my test project so the restriction to object doesn’t bother me too much. JsonResult.Data has that type and that was the problem I was trying to solve.
internal static class ObjectExtensions
{
    public class DynamicWrapper : DynamicObject
    {
        private readonly Type _subjectType;
        private readonly object _subject;

        public static dynamic Create(object subject)
        {
            return new DynamicWrapper(subject);
        }

        private DynamicWrapper(object subject)
        {
            _subject = subject;
            _subjectType = subject.GetType();
        }

        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            try
            {
                var propertyInfo = _subjectType.GetProperty(binder.Name);

                var getter = propertyInfo.GetGetMethod();

                result = getter.Invoke(_subject, null);

                if (result.IsAnonymous())
                {
                    result = Create(result);
                }
                return true;
            }
            catch
            {
                result = null;
                return false;
            }
        }
    }

    public static bool IsAnonymous(this object obj)
    {
        if (obj == null)
        {
            return false;
        }

        var type = obj.GetType();

        return Attribute.IsDefined(type, typeof(CompilerGeneratedAttribute), false)
                  && type.IsGenericType && type.Name.Contains("AnonymousType")
                  && (type.Name.StartsWith("<>") || type.Name.StartsWith("VB$"))
                  && (type.Attributes & TypeAttributes.NotPublic) == TypeAttributes.NotPublic;
    }

    public static dynamic AsDynamic(this object obj)
    {
        return DynamicWrapper.Create(obj);
    }
}
And the above code sample now becomes much cleaner and I am much happier:
var model = result.Data.AsDynamic();

var subModel = model.data as SubModel;

Wednesday, February 12, 2014

Putting It All Together: TDD for Dapper Repositories

In the last several posts I've set up the infrastructure for doing Test Driving Development for Dapper with lightweight integration tests. You might want to review these if you haven't already looked at them.


I'm going to keep it relatively simple for this example. Essentially my repositories are just going to implement CRUD operations for each aggregate root. There will be Create, Get and GetAll (Read), Update, and Delete methods. I’m also going to make the choice to have most exceptions bubble up. This might not be the best choice but it will keep the example simple. In fact, you might want to catch most exceptions and transform them into application-level exceptions which you’d handle at higher levels. Testing exceptional behavior is more difficult to set up in integration tests and we can more easily test exceptions with true unit tests in the classes that use our repositories.

Because these are actually integration tests we run into a chicken and egg issue. Unless we write some other (duplicate) code, we can't test Get() without Create() or Create() without Get() in a self-contained test method.
My solution is pragmatic, start with Get(). The logic is relatively simple and easier to convince yourself of it's correctness. You could write some code using a data reader to read and verify that something is created but you'd still have to convince yourself that code is correct. Since we're not so much testing Dapper - it already has tests - as we are our SQL, I prefer to start off with Get() and not write the extra code. As they say, "your mileage may vary," you might want to start with something else.

Note you won't be able to create tests for Get() until you've done Create(), but once you've created your Get() implementation and convinced yourself of its correctness, you can move onto Create() and use TDD to create that code. After you can create and retrieve an object, you're ready for GetAll(), Delete(), Update(), and any specialized methods. Do go back and create tests for Get(). These are after all, integration tests, we can write those after the fact. As we'll see that will come in handy later when we extend our Get() implementation to get some related data from another table.

We’re going to create two separate repositories, one for Users and one for Posts. Let's start with the DapperUserRepository first. All of the code referenced below can be found, though in it's final form, at https://github.com/tvanfosson/dapper-integration-testing

DapperUserRepository

First, we'll solve our "chicken and egg" problem by implementing Get().
public User Get(int id)
{
    const string sql = "SELECT * FROM [Users] WHERE [Id] = @userId";

    var user = Fetch(c => c.Query<User>(sql, new { userId = id }).SingleOrDefault());

    return user;
}
Now, if you’re paying attention, you should be saying “Wait a minute – that’s too much code there. You’re cheating.” Yes, I am.I’m showing you the results after some refactoring – refactoring done in another project.In another project I found it helpful to develop a base repository pattern. This base implementation provides some convenient methods for performing the Dapper Query and Execute operations. Below is the base class implementation for the repositories we'll create. Since I've tested this pattern in other projects, I feel comfortable starting with it as the basis for my new repositories.
public abstract class DapperRepositoryBase
{
    protected readonly IConnectionFactory _connectionFactory;
    protected readonly string _connectionStringName;

    protected DapperRepositoryBase(IConnectionFactory connectionFactory, string connectionStringName)
    {
        _connectionFactory = connectionFactory;
        _connectionStringName = connectionStringName;
    }

    protected DbConnection OpenConnection()
    {
        var connection = _connectionFactory.Create(_connectionStringName);
        connection.Open();
        return connection;
    }

    protected T Fetch<T>(Func<DbConnection, T> func)
    {
        if (func == null)
        {
            throw new ArgumentNullException("func");
        }

        using (var connection = OpenConnection())
        {
            return func(connection);
        }
    }

    protected int Execute(Func<DbConnection, int> func)
    {
        if (func == null)
        {
            throw new ArgumentNullException("func");
        }

        using (var connection = OpenConnection())
        {
            return func(connection);
        }
    }

    // remainder of class omitted
}
If you want to start without the base class implementation, your Get() implementation might look a little bigger, maybe like the following. For the rest of the examples, we'll use my base class and I hope it will be obvious where the refactorings came from.
public User Get2(int id)
{
    const string sql = "SELECT * FROM [Users] WHERE [Id] = @userId";

    using (var connection = _connectionFactory.Create(_connectionStringName))
    {
        connection.Open();

        var user = connection.Query<User>(sql, new { userId = id }).SingleOrDefault();

        return user;
    }
}
Now that we have (as we've convinced ourselves) a working Get() method, we can now turn to developing Create. First, we stub it in and have it throw a NotImplementedException.
public class DapperUserRepository : DapperRepositoryBase, IUserRepository
{
    public DapperUserRepository(IConnectionFactory connectionFactory, string connectionStringName)
        : base(connectionFactory, connectionStringName)
    {
    }

    public void Create(User user)
    {
        throw new NotImplementedException();
    }

    // Get omitted
}
Then, we'll create our first test. Typically I'll start with the success case first. In this case, we'll test that a User can be successfully created. We'll add some helper methods to the test context that allows us to get an instance of the repository and create a user with some pre-filled properties. Note that since we're using autoincrement ids, our create code will need to fill this value in when performing the insert.
public class DapperUserRepositoryTests : TestBase
{
    private DapperUserRepositoryTestContext _c;

    [TestMethod]
    public void When_a_new_user_is_created_the_data_is_inserted()
    {
        var repository = _c.GetRepository();
        var user = _c.CreateStandardUser();

        repository.Create(user);

        var retrievedUser = repository.Get(user.Id);

        Assert.AreEqual(user.DisplayName, retrievedUser.DisplayName);
        Assert.AreEqual(user.Email, retrievedUser.Email);
        Assert.AreEqual(user.Active, retrievedUser.Active);
    }
    
    // test context omitted
}
We run the test and it fails, as expected. Now let's provide the least code necessary, in an integration test context, to satisfy the test, that is, insert the record into the database so that when it's retrieved the stored values are matched by the retrieved values. We need to insert the DisplayName, Email, CreatedDate, and Active state.
public void Create(User user)
{
    const string sql = "INSERT INTO [Users] ([DisplayName], [Email], [CreatedDate], [Active]) OUTPUT inserted.[Id] VALUES (@displayName, @email, @createdDate, @active)";

    var id = Fetch(c => c.Query<int>(sql, new
    {
        displayName = user.DisplayName,
        email = user.Email,
        createdDate = DateTime.Now,
        active = user.Active
    }).Single());

    user.Id = id;
}
We run the test now and it passes. Next we might want to test that the created date is also updated when the user is created. This is a bit tricky because we can't simply compare dates between the CLR and SqlServer; since their resolutions are different we'll get spurious errors. One way around this is to take a snapshot of the time right before and right after the insert and make sure that the inserted date falls between those ranges within an acceptable bound - for me I only care about it being accurate to the nearest second. We'll use another helper method to round our times to the nearest second for comparison. Thus our next test is:
[TestMethod]
public void When_a_new_user_is_created_the_current_time_is_set_as_the_created_date()
{
    var repository = _c.GetRepository();
    var user = _c.CreateStandardUser();

    var before = _c.RoundToSecond(DateTime.Now);
    repository.Create(user);
    var after = _c.RoundToSecond(DateTime.Now);

    var retrievedUser = repository.Get(user.Id);

    var created = _c.RoundToSecond(retrievedUser.CreatedDate);

    // we only care about the nearest second
    Assert.IsTrue(before <= created && created <= after);
}
Running this test fails because we're not retrieving the CreatedDate. In this case I think it's easier to set the value on the object instead of storing it, so we'll implement it as follows.
public void Create(User user)
{
    var date = DateTime.Now;

    const string sql = "INSERT INTO [Users] ([DisplayName], [Email], [CreatedDate], [Active]) OUTPUT inserted.[Id] VALUES (@displayName, @email, @createdDate, @active)";

    var id = Fetch(c => c.Query<int>(sql, new
    {
        displayName = user.DisplayName,
        email = user.Email,
        createdDate = date,
        active = user.Active
    }).Single());

    user.Id = id;
    user.CreatedDate = date;
}
Note that we don't set any values on the object until the Fetch has succeeded - we don't want the object state to be updated if there is an exception. Even though the code (appears) to work that way, we'll create another test to ensure that.
[TestMethod]
public void When_an_exception_is_thrown_the_id_and_date_are_not_updated()
{
    var repository = _c.GetRepository();
    var user = _c.CreateStandardUser();
    var otherUser = _c.CreateStandardUser(1);
    otherUser.DisplayName = user.DisplayName;

    repository.Create(user);

    var expectedId = otherUser.Id = -1;
    var expectedDate = otherUser.CreatedDate = DateTime.MinValue;

    try
    {
        repository.Create(otherUser);
        Assert.Fail("Expected SqlException not thrown.");
    }
    catch (SqlException)
    {
        Assert.AreEqual(expectedId, otherUser.Id);
        Assert.AreEqual(expectedDate, otherUser.CreatedDate);
    }
}
And that test passes, as we hoped. We continue writing tests in this manner, testing small bits of functionality until we believe that our methods perform the way we think they should. While it won't be possible without developing a parallel implementation for our functions to develop the tests without interdependencies, this gives us a reasonably simple way to build up our code in a Test Driven fashion.

Hopefully this small example, with the code included in the GitHub repository, has shown you how to get started doing TDD with Dapper. I've left the code in the repository incomplete so you can experiment with it. In particular, there are some interesting behaviors in the Post repository. You might want to experiment with getting the posts associated with a specific user, which returns a filtered collection, for example.

Tuesday, February 11, 2014

Solving the Challenges of Dapper Integration Testing: Part 3



A Dapper GentlemanThis is the fourth in a series of posts on lightweight integration testing with Dapper. These lightweight tests will enable us to do Test-Driven Development when creating our Dapper-based repositories. In my previous posts I’ve discussed the goals for our tests and covered three of the four challenges that were introduced. The first post in the series introduced our goals: tests that are automated, fast, simple, independent, self-contained, safe, and require minimal extra tooling to implement. The second and third have covered the challenges so far: providing a data connection, isolating the execution of the tests, and cleaning up our test data between tests.

In this post we’ll cover the last remaining challenge: keeping the test database schema in sync with our production database. This code and the code for all previous posts is available at https://github.com/tvanfosson/dapper-integration-testing

Synchronizing the Schema


To keep our database synchronized, and in fact, to provide a way to maintain our database schema we’ll borrow a concept used in Ruby/Rails and Entity Framework, namely database migrations. While we could do this entirely in code, I’m going to leverage the SQL Server Management Studio (SSMS) tooling and maintain the migrations as a collection of SQL scripts and automate their execution in code. This provides a good compromise, in my opinion, between the complexity of creating an entirely code-based migrations infrastructure and the burden of keeping the database schemas in sync manually.

Certainly, though, if there were one area of this solution that I could improve it would be this one. One possible alternative would be to use Entity Framework migrations with or without using Entity Framework as the ORM (object-relational mapper). Such a solution would, perhaps, be ideal if you were augmenting your Entity Framework-based application with targeted use of Dapper for specific queries requiring the speed advantage that Dapper provides. Because this series is Dapper focused, however, I’m going to use the solution described above: SQL scripts, executed automatically in our test framework. The solution I described is specific to Microsoft SQL Server and SSMS-generated scripts, but can be adapted as needed for other databases and tooling. If you’re not familiar with the Generate Scripts wizard in SSMS, you can find more information at http://technet.microsoft.com/en-us/library/bb895179.aspx.

Approach

I will be using a database-first approach. By that I mean that I will use SSMS to create the database tables, indexes, relations, etc. and then reverse-engineer the scripts that will be used to create and upgrade the production database.  These scripts may be edited by hand to ensure that existing data is preserved and migrated as needed as the schema changes. There are three types of scripts that you might want to produce. Two of these will be used in maintaining the database itself, the “up” and “down” scripts. These are used to create/upgrade and back out upgrades, respectively. We’ll make use of the “up” scripts, which I typically name with “Create” or “Add” (as in “add this functionality”) in the script name. The third type of script, the “delete” script will be used solely for our testing purposes. This could be automated in the same manner as our database clean up, but since these scripts are easy enough to generate with the wizard we’ll leave that for a future exercise.

Create the Database

For my example, I’m going to use the tried and true (and trite) blog post. Note: the model is not intended to be realistic, but rather illustrative of the concepts that we’ll use for testing.

We’ll create tables representing Users, Posts, and PostDetails. The table names follow the plural convention and will typically correspond with our models directly. All of our tables will use artificial, auto-increment integer identity columns, named Id, as their primary keys. A User will have a DisplayName (nvarchar), Email (login id) (nvarchar), a CreatedDate (datetime), and a bit column, Active, indicating whether the user is currently an active user or not.

A Post has an associated OwnerId, a foreign key to the Id in the Users table, a Title (nvarchar), Slug (nvarchar), a PostedDate (datetime), EditedDate (datetime), and a bit column indicating whether the post is Deleted or not. Well, set the foreign key (FK) relationship up to cascade on delete. It has an additional non-unique index on the OwnerId column. I anticipate that our routes would incorporate the Post Id and so an index on the Slug column isn’t necessary.

Generate scripts dialog
Generate Create and Delete Scripts
Each Post has a related PostDetail through the PostId, foreign key on PostDetails to Post. The PostDetail will consist of a SequenceNumber (int) and a Text column containing the actual contents of one part of a Post. SequenceNumber will be maintained by our application and will represent separate “pages” in a post. For simplicity we’ll set the Text column up as nvarchar(MAX). It might contain HTML or MarkDown. The FK relationship will be set up to cascade on delete. We will add a unique index on PostId and SequenceNumber, enabling us to use index joins to get the related details for a post quickly.

This will be our simple database. Using Tasks→Generate Scripts we’ll create 000-Create-DapperTesting.sql and 000-Drop-DapperTesting.sql. These will be stored in a DatabaseScripts directory in our source code repository. To ensure success make sure you configure the options to check for existence when generating these scripts under the Advanced options setting in the wizard. You will also want to make sure that all of the necessary objects are included.

Automatically Provision the Test Database

Add the test database to the solution

Add New Items - Data - Service-based Database Dialog
Add a Service-based Database to the Unit Test Project
As I noted in previous posts we’ll be using a LocalDB database for our test database. Once we’ve created our unit test project, we’ll add a Service-based Database to our project. Right-click on the test project, and choose Add Item→New. Select Data, then choose Service-based Database. I gave mine the name DapperTestingDB.mdf. Once you've completed this step, you will want to make sure that the log file is also added to the solution using Add→Existing item, then selecting the LDF file associated with the database. Use the properties dialog to set both the MDF and LDF items as Content items. To be safe, I also set mine up to Copy Always. It's important that the MDF and LDF files be deployed at the same time since an out-of-date log file will prevent your LocalDB instance from being used. By copying both new for each build, this avoids that problem, though you could probably get by with Copy If Newer. I changed mine to Copy Always in the process of debugging the log file synchronization issue and simply haven't changed it back on the theory that you don't change things that aren't broken.

Update the TestContextBase

Now that we have our (empty) test database in the solution, we need a way to initialize it. In fact, we want to reinitialize it each run to capture any changes that may have been made. To do this we’ll update our TestContextBase class that I introduced in my previous posts.

Because we only want the database to be initialized once per test run, not once per test, we’ll create a static constructor that invokes the database initialization code. A static constructor is run once, before “the first instance is created or any static members are referenced.”  Because our static constructor is on the base class, this means that it will be executed once for any and all test contexts that derive from it. This constructor is relatively simple. In it we find the “down” and “up” (or “delete” and “create” as they are called internally) scripts, then first execute the delete scripts, then the create scripts.

In this way, we make sure that each test run starts off with a fresh database that is synchronized to the latest schema represented in the scripts. By making sure that we check for object existence before we delete (or create) it, we can be sure that changes to the scripts don’t cause errors due to a new database object being introduced – that object won’t exist yet and so the existence check will fail and the associated delete will not be attempted.
internal abstract class TestContextBase : IDisposable
{
    private const string ScriptsDirectory = @"..\..\..\..\DatabaseScripts\";
    private readonly static string[] _createScripts = { "000-Create-DapperTesting.sql" };
    private static readonly string[] _deleteScripts = { "000-Drop-DapperTesting.sql" };
    private readonly static string _dbFile = Path.Combine(Environment.CurrentDirectory, "DapperTestingDB.mdf");
    private static readonly string _connectionString = @"Data Source=(LocalDB)\v11.0;Initial Catalog=DapperTesting;Integrated Security=True;AttachDbFilename=" + _dbFile;

    static TestContextBase()
    {
        var deleteScripts = _deleteScripts.Select(s => Path.Combine(ScriptsDirectory, s)).ToList();
        var createScripts = _createScripts.Select(s => Path.Combine(ScriptsDirectory, s)).ToList();

        Assert.IsTrue(deleteScripts.All(File.Exists));
        Assert.IsTrue(createScripts.All(File.Exists));

        ExecuteScripts(deleteScripts);
        ExecuteScripts(createScripts);
    }
    
    // remainder of class omitted
}
The implementation of ExecuteScripts for MS-SQL is interesting in that the scripts generated by SSMS cannot be run directly using SqlCommand due to the inclusion of comments, USE, and the SSMS-special GOcommand. To overcome this we need to first filter out any commands that are unnecessary to maintaining the database and/or will cause issues if invoked as a SqlCommand. After some experimentation I settled on filtering lines that started with "USE", "SET", or starts with "/*" and ends with "*/". This is specific to the scripts created by SSMS. Further the scripts are broken into batches of commands that can be executed together via a single SqlCommand by splitting on the GO keyword.
private static readonly Func<string, bool> _startsBatch = s => s.StartsWith("GO", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isUse = s => s.StartsWith("USE", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isSet = s => s.StartsWith("SET", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isComment = s => s.StartsWith("/*") && s.EndsWith("*/");

private static void ExecuteScripts(IEnumerable<string> scripts)
{
    Func<string, bool> isValidCommand = c => !_isUse(c) && !_isSet(c) && !_isComment(c);
    using (var conn = GetConnection())
    {
        conn.Open();
        foreach (var script in scripts)
        {
            var commands = File.ReadAllLines(script).Where(isValidCommand).ToList();
            IEnumerable<IEnumerable<string>> batches = CreateBatches(commands).ToList();
            foreach (var batch in batches)
            {
                using (var query = conn.CreateCommand())
                {
                    query.CommandText = string.Join(Environment.NewLine, batch);
                    query.ExecuteNonQuery();
                }
            }
        }
    }
}

/// <summary>
/// In order to execute groups of commands that can be executed together (not all can),
/// we separate them by the GO command into distinct, executable sets.  This assumes
/// that the scripts have been generated by Tasks->Generate Scripts in SSMS but will
/// also work if each script only contains commands that can be grouped together.
/// </summary>
/// <param name="commands"></param>
/// <returns></returns>
private static IEnumerable<IEnumerable<string>> CreateBatches(IEnumerable<string> commands)
{
    var batch = new List<string>();
    foreach (var command in commands)
    {
        if (_startsBatch(command))
        {
            if (batch.Any())
            {
                yield return batch;
                batch = new List<string>();
            }
        }
        else if (!string.IsNullOrWhiteSpace(command))
        {
            batch.Add(command);
        }
    }

    if (batch.Any())
    {
        yield return batch;
    }
}
Now our TestContextBase is set up to (re)create our database afresh for each test run. This will add a little bit of overhead to our tests runs, but we minimize this by doing it once and it is more than offset by our ability to do TDD on our Dapper-based repositories.
There are some obvious or apparently obvious opportunities for improvement in the above. First we could maintain many of the settings in a configuration file rather than in code. I chose not to do this because this is test code. It’s really only changing when I’m actively developing the code (and tests) so maintaining changeable settings in a configuration actually doesn’t save us much since we’re modifying the code already.

Another, probably better, opportunity would be to invest in naming conventions – we already have a start on this with the names we’ve used – and standardized locations for our scripts. This would enable use to enumerate them in code at runtime rather than hard-code them in our test context base class. This is something that I would probably do but in a simple case like this I don’t find it necessary. Clearly, we would want to refactor this as the database schema grows.

This wraps up the set up portion of this series. In the next post, I’ll start doing the development of the repositories using TDD based on what we’ve developed so far.

Monday, February 10, 2014

Solving the Challenges of Dapper Integration Testing: Part 2


This is the third in a series of posts on lightweight integration testing with Dapper. These tests are intended to enable us to do Test-Driven Development in creating our Dapper-based repositories. As a reminder our goals – see the first post for the complete explanation – are tests that are:
    A Dapper Gentleman
  • Automated
  • Fast
  • Simple
  • Minimal Tooling
  • Independent
  • Self-contained
  • Safe

In my last post I covered the two easier challenges: providing a database connection in a way that lets us easily substitute our testing database and isolating our tests. Because these are integration tests, the latter is important as the database is a common interaction point for our tests and running them in isolation keeps the test data from one test from causing failures in other tests.

In this post I’m going to cover one of the two remaining challenges: cleaning up test data after each test. We'll leave the last challenge for the next post.

Data Clean Up


One of the difficulties with integration tests, especially the isolated kind that I am interested in, is the effort of making sure that your data is set up exactly as you want pre-test. Generally, this will involve writing some code that cleans up any data from previous tests or test runs. This is complicated by relationships in the data, especially in the case where deletes do not cascade. In those cases, a delete can fail when the data in the “child” relation hasn’t been deleted first.

One way to handle this is writing database-specific delete code that understands the relationships and performs deletes in the correct sequence to prevent failures due to non-cascading deletes. To me, this is painful – and certainly not simple.

Another way to handle it would be to simply delete the database tables and related objects between each test run. As we’ll see later, this will be necessary when maintaining currency with the database, but it seems overkill – and certainly slower than we would like, deleting and re-creating the schema per test.

It would be better if we could to leverage the database structure itself to automatically create delete statements in the proper order. Rather than writing the code once-per database, we could write the code once per database provider and it would suffice for most, if not all, databases that use that provider. At least it would suffice for the kinds of databases that I’m concerned with most of the time.

This idea is not original with me and I didn’t want to reinvent the wheel if I could possibly avoid it so I did a bit of research into automated integration testing. During my research, I came across a DatabaseDeleter class by Jimmy Bogard, who blogs at LosTechies.com, that I thought was the perfect solution to the problem of automatically cleaning up data generated in an integration test. The only problem – it was written for use with nHibernate. I adapted it to work using SqlConnection and SqlCommand for Microsoft SQL Server (the version I’m using is Microsoft SQL Server 2012).

The most interesting method is the BuildTableList method (full code is available on GitHub):BuildTableList
private static IEnumerable<string> BuildTableList(ICollection<string> allTables,
                                                  ICollection<Relationship> allRelationships)
{
    var tablesToDelete = new List<string>();

    while (allTables.Any())
    {
        var leafTables = allTables.Except(allRelationships
                                  .Select(rel => rel.PrimaryKeyTable))
                                  .ToArray();

        tablesToDelete.AddRange(leafTables);

        foreach (var leafTable in leafTables)
        {
            allTables.Remove(leafTable);

            var relToRemove = allRelationships.Where(rel => rel.ForeignKeyTable == leafTable)
                                              .ToArray();

            foreach (var rel in relToRemove)
            {
                allRelationships.Remove(rel);
            }
        }
    }

    return tablesToDelete;
}

This method takes a collection of tables (all the tables in the database) and a collection of parent-child relationships between tabels. If you think of the relationships as forming a set of trees, it performs a bottom-up traversal across these trees, adding the leaf nodes discovered in each pass to an ordered list of tables to delete data from. Simultaneously, it removes those leaf nodes from the trees, by removing the relationships and removing the tables from the set of tables to consider, so that in the next pass the parents of those leaf nodes become the new leaf nodes. In this way, if we delete data from the ordered table collection in order, we are guaranteed to not violate any foreign key constraints. This gives us a safe order in which to build the delete commands to avoid failures.

This collection is used by BuildDeleteTables to generate the delete command associated with a particular database. This is constructed once per test run and subsequently reused per test clean up to remove any data that the test may have added.

BuildDeleteTables/BuildDeleteSql
private void BuildDeleteTables()
{
    if (_initialized.Contains(_database))
    {
        return;
    }

    lock (_lockObj)
    {
        if (!_initialized.Add(_database))
        {
            return;
        }

        using (var connection = new SqlConnection(_database))
        {
            connection.Open();

            var allTables = GetAllTables(connection);

            var allRelationships = GetRelationships(connection);

            _tablesToDelete = BuildTableList(allTables, allRelationships);

            _deleteSql[_database] = BuildTableSql(_tablesToDelete);
        }
    }
}

private static string BuildTableSql(IEnumerable<string> tablesToDelete)
{
    return string.Join(" ", tablesToDelete.Select(t => string.Format("delete from [{0}];", t)));
}
The DatabaseDeleter class is used in the Dispose method (following Chris Patterson’s IDisposable Done Right pattern) in our abstract TestContextBase class.

TestContextBase (Dispose implementation)
internal abstract class TestContextBase : IDisposable
{
    // Remainder of class omitted for brevity
    
    // IDisposable implementation. On dispose of the context, we clean up any data inserted by the test
    
    protected bool _disposed;

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    ~TestContextBase()
    {
        Dispose(false);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (_disposed)
        {
            return;
        }

        if (disposing)
        {
            var deleter = new DatabaseDeleter(_connectionString);
            deleter.DeleteAllData();
        }

        _disposed = true;
    }
}
The Dispose method is invoked in our test classes in the clean up method, which we’ve seen before, but is now shown with the context-related code for a sample DapperUserRepository below. After each test, the test context is disposed, and the DeleteAllData for the related database is run, cleaning up any data that was inserted by the test.
[TestInitialize]
public void Init()
{
    Start();
    _c = new DapperUserRepositoryTestContext();
}

[TestCleanup]
public void Cleanup()
{
    if (_c != null)
    {
        _c.Dispose();
    }
    End();
}
That wraps up our data clean up task. Using our base test context and it's IDisposable implementation lets us make sure that each test starts with a clean database, as long as our clean up method runs (see the discussion in first post in the series). Next time we'll see how to handle creating the database schema so that our test database is automatically set up for each test run as our database changes.

Sunday, February 9, 2014

Solving the Challenges of Dapper Integration Testing: Part 1

In my last post I outlined the series of events that led me to want to write some lightweight integration tests for Dapper that would allow me to develop my Dapper repositories using Test-Driven Development (TDD). These tests wouldn't replace application-level integration tests, but would allow me to "unit test" my Dapper code so that I could develop it in a test-driven manner. My implementation would have to meet several requirements (see my previous post for a fuller explanation):
    A Dapper Gentleman
  • Automated
  • Fast
  • Simple
  • Minimal Tooling
  • Independent
  • Self-contained
  • Safe
Achieving these requirements could be a challenge with integration tests. I outlined several challenges that I would face including automatically setting up the database, cleaning up data between tests, isolating the operations of the tests, and programmatically providing a connection to the repository that could be mocked out. In this post I will describe how I addressed each of these challenges, including the code I created or adapted to use for my scenario.

All of the code for these examples can be found at https://github.com/tvanfosson/dapper-integration-testing. The inline code snippets may omit some details to save space but you can find the complete code in the repository.

Let's take them from simplest to most complex.  In this post we’ll cover providing a connection and isolating the execution of the tests from each other.

Providing a Connection


Our repository will function as a "repository of queries." Each method encapsulates some query or command that interacts with the database. In some cases it might be several queries that function to provide complex data or complete a complex transaction. Importantly, each method in our repository will function as an independent unit of work. In this case, rather than provide a connection to the repository, instead we want to provide a mechanism to obtain a connection that can be created and disposed within the context of a single method.

We will use dependency injection to provide an IConnectionFactory instance. In production, the connection factory will be hooked up in our dependency injection framework to provide connections to our database provider of choice. As long as that provider supports DbConnection, it will work for our purposes. Since Dapper requires this to function, that doesn't restrict our choices of provider any more than the choice of Dapper itself does. IConnectionFactory needs only one method, Create(string connectionStringName), which takes the name of a connection string and provides a DbConnection to the provider specified by the connection string. In practices, our implementations of this class will likely be dependent on some configuration wrapper that allows it to retrieve the named connection string from a configuration file.
We'll provide a base class for our repositories that uses the factory and makes it available to all repositories derived from it.

IConnectionFactory
public interface IConnectionFactory
{
    DbConnection Create(string connectionStringName);
}
And here is one possible implementation for a SqlServer-backed repository using a configuration wrapper (code not shown).
public class MsSqlConnectionFactory : IConnectionFactory
{
    private readonly IConfiguration _configuration;

    public MsSqlConnectionFactory(IConfiguration configuration)
    {
        _configuration = configuration;
    }

    public DbConnection Create(string connectionStringName)
    {
        return new SqlConnection(_configuration.GetConnectionString(connectionStringName));
    }
}
DapperRepositoryBase (constructor)
public abstract class DapperRepositoryBase
{
    protected readonly IConnectionFactory _connectionFactory;
    protected readonly string _connectionStringName;

    protected DapperRepositoryBase(IConnectionFactory connectionFactory, string connectionStringName)
    {
        _connectionFactory = connectionFactory;
        _connectionStringName = connectionStringName;
    }
    
    // remainder omitted
}
In our tests we will use a SqlServer LocalDB instance added to the test project as a content item. Each of our test classes will have a test context that provides a mechanism to obtain the connection factory and pass back a mock that will return a connection to our LocalDB instance. We’ll see more of this test context base class later as it provides additional common behavior to support our test classes. Notice that it’s internal – it’s not needed outside our test project – and it implements IDisposable. That will be important later, but I'm not going to give it away how just yet.

Note: I’m using FakeItEasy, a dead-simple to use dynamic fake framework to provide a mock connection factory. Since these are integration tests, this is the only usage we’re likely to see.  I’ve got it set up to provide a new connection instance every time it’s called since that’s the way our real implementations will work.

TestContextBase
internal abstract class TestContextBase : IDisposable
{
    private readonly static string _dbFile = Path.Combine(Environment.CurrentDirectory,
                                                          "DapperTestingDB.mdf");
    private static readonly string _connectionString =
        @"Data Source=(LocalDB)\v11.0;Initial Catalog=DapperTesting;"
        + "Integrated Security=True;AttachDbFilename="
        + _dbFile;

    protected static DbConnection GetConnection()
    {
        return new SqlConnection(_connectionString);
    }

    protected static IConnectionFactory CreateConnectionFactory(string connectionStringName)
    {
        var connectionFactory = A.Fake<IConnectionFactory>();

        A.CallTo(() => connectionFactory.Create(connectionStringName))
         .ReturnsLazily(f => GetConnection());

        return connectionFactory;
    }
    
    // Disposable implementation omitted (we'll see it later)
}

Isolating Test Execution


Because our tests are touching the same database, there’s a strong likelihood that we could get inconsistent or incorrect results if more than one test runs at a specific time. A test that inserts some data in the database, then runs a query against it may get different data than expected if another test, running at the same time, also inserts some data into the same tables being queried by the first test. To prevent this, we want to make sure that our tests run serially, one after the other, instead of in parallel.

One way to do this is to use Ordered Tests. This will force the tests to run in a specific order and, if we check the Continue after failure option, this should give us what we want, a way to run independent tests in an execution-isolated way. Doing this, however, adds some manual operations to maintaining our test suite and prevents us from simply running these tests along with our other (unit) tests in the same way that we are used to. My preference is to use the TestInitialze and TestCleanup methods with some synchronization code to enforce execution-isolation. To do that, I create a TestBase class that all of my test classes will derive from. This class will contain Start and Stop methods that can be used by our setup/teardown methods in the test class to enforce a single thread of execution through the test suite using a Monitor. Using a Monitor with a shared lock object will allow our tests to synchronize themselves through test setup/teardown.

Note: one downside to this method is that it may be possible for certain types of exceptions to fail our tests in a way that the teardown code does not get called. In this case, our tests will hang. This could be a problem if you do continuous integration and check in code that fails in that particular way. This should only happen in failure conditions, not when your tests are passing. You do make sure your tests pass before you check code into your repositories, don't you? The trade-off of being able to use my normal unit testing processes for these tests as well is worth the potential risk to me.

TestBase
public abstract class TestBase
{
    private static readonly object _lock = new object();

    protected void Start()
    {
        Monitor.Enter(_lock);
    }

    protected void End()
    {
        Monitor.Exit(_lock);
    }
}
Example set up and teardown
[TestInitialize]
public void Init()
{
    Start();
    // create our concrete test context
}

[TestCleanup]
public void Cleanup()
{
    // dispose of our concrete test context
    End();
}
That wraps it up for this post, in future posts I'll tackle data clean up, then database set up. These will require more complex solutions and some convention-based programming.

Saturday, February 8, 2014

Lightweight Integration Testing with Dapper

Motivation, Requirements, and Challenges


Motivation


Recently I worked on a project that used Dapper as the data access layer, creating repositories based on Dapper queries for each aggregate root in the domain model. Dapper (https://github.com/StackExchange/dapper-dot-net) if you aren't aware, is "a simple object mapper for .Net." Dapper was developed for and is used on StackOverflow.com. Because it's built as extensions to DbConnection it isn't
A Dapper Gentleman tied to any specific database and works with a variety of database providers. There are some micro-ORM implementations built on top of Dapper, but we used it directly. In it's base form, Dapper consists of generic query/execute methods that take SQL statements as a string, run the SQL on the database, and map the resulting output to either a collection of objects (type specified in the call) or as a count of the results returned.

Dapper is pretty simple and simple to use. That's one of the reasons that it is also pretty fast. That was the reason that we chose to use it. I say "we", but really the decision had been made prior to my involvement in the project and my involvement was limited to extending the implementation for more use cases. Because it's simple, you might be able to convince yourself of the correctness of your implementations and may not design any specific tests for your repositories.

That was the situation I found when I joined the project. There were plenty of unit tests, but they stopped at the repository layer. All of the Dapper-based repositories had interfaces and those interfaces were mocked in the unit tests. There were no tests - unit or integration - for the repositories themselves. Again, the queries were pretty simple, so code inspection was used to verify correctness along with an extensive QA process.

That is, until I added a feature that required more complex behavior, specifically a relation that had some rather complex logic that involved left joins, multiple result sets, and transactional behavior. Dapper was able to handle all that I threw at it, but I felt uncomfortable that (1) the only testing that my queries had was my own manual, exploratory testing and (2) I'd need to be careful to manually test around any code changes that I wanted to make.

Because Dapper is implemented as extension methods, it doesn't lend it self to mocking and I didn't really want to introduce another layer around Dapper that would have similar problems. Moreover, I felt that I needed to not only verify that my SQL commands were being execute but also that my SQL actually did what I expected. What I really wanted/needed was a set of lightweight integration tests that would be performed against a test database, ideally with a minimum amount of manual set up, that could be executed as if they were unit tests.

Requirements

Automated

For me, for tests to be useful, they need to be automated. I wanted these tests to be able be run along side my true unit tests.

Fast

Because I wanted the tests to run regularly, they need to be reasonably fast. The tests can't be too big or too extensive. Simple, usable tests against a local database are my ideal.

Simple

I'm not looking to replace acceptance tests or the real integration tests for the application. Keeping the tests simple will help keep them fast. Keeping them simple also means having to develop the minimum amount of code to create a new test.

Minimal Tooling

I want these to run on any developer's machine and, potentially, on a build or continuous integration server.

Independent

In the spirit of unit tests, each test should be independent. The data created for and by the test should be unavailable and should not interfere with other tests. The tests should be able to run in any order. To keep things as simple as possible, though, we'll relax this constraint for catastrophic failures. Basically, when the per-test clean up tasks runs, we want to guarantee that each test gets a fresh start.

Self-contained

Tests, or the context they are run in, will perform any data set up required for the test. There should be no manual steps or external scripts that need to be invoked before the tests can be run. Any database or data set up that needs to be done, including any scripts that need to execute, should be performed by the test code.

Safe

Because these tests are going to be used to develop code, the environment that they run in should be isolated. The tests shouldn't connect directly to other application databases, even to read data, to maximize safety.

Based on these requirements I created the tests in a unit test project. Each repository gets it's own test class. I opted to go with LocalDB as the database provider. I added the MDF file (and log) to the test project as a content item, updated when newer. Using a unit test project will allow me to tie into the standard test running and ensure that my Dapper integration tests are performed along with the rest of my unit tests. Using LocalDB will keep my tests isolated.

Challenges

It may not seem obvious at first glance but these choices along with the requirements and good software practices raise some interesting challenges.
  1. Creating the database schema. To be self-contained, our test framework will need a way to ensure that the database has the correct tables and other database objects for each test run.
  2. Data clean up. We could have each test be responsible for cleaning up after itself but in order to keep them simple I think we can do better. We want an automated method to remove data, including relational data, created by any of our tests that can run in the test clean up phase.
  3. Test isolation. Because the tests need to be independent and yet share a database, we need to ensure that they are run sequentially, including initialization and clean up tests. Different unit test runners may or may not guarantee the behavior that we need so we need a mechanism to guarantee this regardless of the test runner.
  4. Providing a connection to a repository for use during the test that doesn't rely on any manual configuration yet maintains good development practices.
In my next post I will detail the solutions to two of the challenges I've presented and get a start on the code required to implement the solution.