Tuesday, February 11, 2014

Solving the Challenges of Dapper Integration Testing: Part 3



A Dapper GentlemanThis is the fourth in a series of posts on lightweight integration testing with Dapper. These lightweight tests will enable us to do Test-Driven Development when creating our Dapper-based repositories. In my previous posts I’ve discussed the goals for our tests and covered three of the four challenges that were introduced. The first post in the series introduced our goals: tests that are automated, fast, simple, independent, self-contained, safe, and require minimal extra tooling to implement. The second and third have covered the challenges so far: providing a data connection, isolating the execution of the tests, and cleaning up our test data between tests.

In this post we’ll cover the last remaining challenge: keeping the test database schema in sync with our production database. This code and the code for all previous posts is available at https://github.com/tvanfosson/dapper-integration-testing

Synchronizing the Schema


To keep our database synchronized, and in fact, to provide a way to maintain our database schema we’ll borrow a concept used in Ruby/Rails and Entity Framework, namely database migrations. While we could do this entirely in code, I’m going to leverage the SQL Server Management Studio (SSMS) tooling and maintain the migrations as a collection of SQL scripts and automate their execution in code. This provides a good compromise, in my opinion, between the complexity of creating an entirely code-based migrations infrastructure and the burden of keeping the database schemas in sync manually.

Certainly, though, if there were one area of this solution that I could improve it would be this one. One possible alternative would be to use Entity Framework migrations with or without using Entity Framework as the ORM (object-relational mapper). Such a solution would, perhaps, be ideal if you were augmenting your Entity Framework-based application with targeted use of Dapper for specific queries requiring the speed advantage that Dapper provides. Because this series is Dapper focused, however, I’m going to use the solution described above: SQL scripts, executed automatically in our test framework. The solution I described is specific to Microsoft SQL Server and SSMS-generated scripts, but can be adapted as needed for other databases and tooling. If you’re not familiar with the Generate Scripts wizard in SSMS, you can find more information at http://technet.microsoft.com/en-us/library/bb895179.aspx.

Approach

I will be using a database-first approach. By that I mean that I will use SSMS to create the database tables, indexes, relations, etc. and then reverse-engineer the scripts that will be used to create and upgrade the production database.  These scripts may be edited by hand to ensure that existing data is preserved and migrated as needed as the schema changes. There are three types of scripts that you might want to produce. Two of these will be used in maintaining the database itself, the “up” and “down” scripts. These are used to create/upgrade and back out upgrades, respectively. We’ll make use of the “up” scripts, which I typically name with “Create” or “Add” (as in “add this functionality”) in the script name. The third type of script, the “delete” script will be used solely for our testing purposes. This could be automated in the same manner as our database clean up, but since these scripts are easy enough to generate with the wizard we’ll leave that for a future exercise.

Create the Database

For my example, I’m going to use the tried and true (and trite) blog post. Note: the model is not intended to be realistic, but rather illustrative of the concepts that we’ll use for testing.

We’ll create tables representing Users, Posts, and PostDetails. The table names follow the plural convention and will typically correspond with our models directly. All of our tables will use artificial, auto-increment integer identity columns, named Id, as their primary keys. A User will have a DisplayName (nvarchar), Email (login id) (nvarchar), a CreatedDate (datetime), and a bit column, Active, indicating whether the user is currently an active user or not.

A Post has an associated OwnerId, a foreign key to the Id in the Users table, a Title (nvarchar), Slug (nvarchar), a PostedDate (datetime), EditedDate (datetime), and a bit column indicating whether the post is Deleted or not. Well, set the foreign key (FK) relationship up to cascade on delete. It has an additional non-unique index on the OwnerId column. I anticipate that our routes would incorporate the Post Id and so an index on the Slug column isn’t necessary.

Generate scripts dialog
Generate Create and Delete Scripts
Each Post has a related PostDetail through the PostId, foreign key on PostDetails to Post. The PostDetail will consist of a SequenceNumber (int) and a Text column containing the actual contents of one part of a Post. SequenceNumber will be maintained by our application and will represent separate “pages” in a post. For simplicity we’ll set the Text column up as nvarchar(MAX). It might contain HTML or MarkDown. The FK relationship will be set up to cascade on delete. We will add a unique index on PostId and SequenceNumber, enabling us to use index joins to get the related details for a post quickly.

This will be our simple database. Using Tasks→Generate Scripts we’ll create 000-Create-DapperTesting.sql and 000-Drop-DapperTesting.sql. These will be stored in a DatabaseScripts directory in our source code repository. To ensure success make sure you configure the options to check for existence when generating these scripts under the Advanced options setting in the wizard. You will also want to make sure that all of the necessary objects are included.

Automatically Provision the Test Database

Add the test database to the solution

Add New Items - Data - Service-based Database Dialog
Add a Service-based Database to the Unit Test Project
As I noted in previous posts we’ll be using a LocalDB database for our test database. Once we’ve created our unit test project, we’ll add a Service-based Database to our project. Right-click on the test project, and choose Add Item→New. Select Data, then choose Service-based Database. I gave mine the name DapperTestingDB.mdf. Once you've completed this step, you will want to make sure that the log file is also added to the solution using Add→Existing item, then selecting the LDF file associated with the database. Use the properties dialog to set both the MDF and LDF items as Content items. To be safe, I also set mine up to Copy Always. It's important that the MDF and LDF files be deployed at the same time since an out-of-date log file will prevent your LocalDB instance from being used. By copying both new for each build, this avoids that problem, though you could probably get by with Copy If Newer. I changed mine to Copy Always in the process of debugging the log file synchronization issue and simply haven't changed it back on the theory that you don't change things that aren't broken.

Update the TestContextBase

Now that we have our (empty) test database in the solution, we need a way to initialize it. In fact, we want to reinitialize it each run to capture any changes that may have been made. To do this we’ll update our TestContextBase class that I introduced in my previous posts.

Because we only want the database to be initialized once per test run, not once per test, we’ll create a static constructor that invokes the database initialization code. A static constructor is run once, before “the first instance is created or any static members are referenced.”  Because our static constructor is on the base class, this means that it will be executed once for any and all test contexts that derive from it. This constructor is relatively simple. In it we find the “down” and “up” (or “delete” and “create” as they are called internally) scripts, then first execute the delete scripts, then the create scripts.

In this way, we make sure that each test run starts off with a fresh database that is synchronized to the latest schema represented in the scripts. By making sure that we check for object existence before we delete (or create) it, we can be sure that changes to the scripts don’t cause errors due to a new database object being introduced – that object won’t exist yet and so the existence check will fail and the associated delete will not be attempted.
internal abstract class TestContextBase : IDisposable
{
    private const string ScriptsDirectory = @"..\..\..\..\DatabaseScripts\";
    private readonly static string[] _createScripts = { "000-Create-DapperTesting.sql" };
    private static readonly string[] _deleteScripts = { "000-Drop-DapperTesting.sql" };
    private readonly static string _dbFile = Path.Combine(Environment.CurrentDirectory, "DapperTestingDB.mdf");
    private static readonly string _connectionString = @"Data Source=(LocalDB)\v11.0;Initial Catalog=DapperTesting;Integrated Security=True;AttachDbFilename=" + _dbFile;

    static TestContextBase()
    {
        var deleteScripts = _deleteScripts.Select(s => Path.Combine(ScriptsDirectory, s)).ToList();
        var createScripts = _createScripts.Select(s => Path.Combine(ScriptsDirectory, s)).ToList();

        Assert.IsTrue(deleteScripts.All(File.Exists));
        Assert.IsTrue(createScripts.All(File.Exists));

        ExecuteScripts(deleteScripts);
        ExecuteScripts(createScripts);
    }
    
    // remainder of class omitted
}
The implementation of ExecuteScripts for MS-SQL is interesting in that the scripts generated by SSMS cannot be run directly using SqlCommand due to the inclusion of comments, USE, and the SSMS-special GOcommand. To overcome this we need to first filter out any commands that are unnecessary to maintaining the database and/or will cause issues if invoked as a SqlCommand. After some experimentation I settled on filtering lines that started with "USE", "SET", or starts with "/*" and ends with "*/". This is specific to the scripts created by SSMS. Further the scripts are broken into batches of commands that can be executed together via a single SqlCommand by splitting on the GO keyword.
private static readonly Func<string, bool> _startsBatch = s => s.StartsWith("GO", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isUse = s => s.StartsWith("USE", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isSet = s => s.StartsWith("SET", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isComment = s => s.StartsWith("/*") && s.EndsWith("*/");

private static void ExecuteScripts(IEnumerable<string> scripts)
{
    Func<string, bool> isValidCommand = c => !_isUse(c) && !_isSet(c) && !_isComment(c);
    using (var conn = GetConnection())
    {
        conn.Open();
        foreach (var script in scripts)
        {
            var commands = File.ReadAllLines(script).Where(isValidCommand).ToList();
            IEnumerable<IEnumerable<string>> batches = CreateBatches(commands).ToList();
            foreach (var batch in batches)
            {
                using (var query = conn.CreateCommand())
                {
                    query.CommandText = string.Join(Environment.NewLine, batch);
                    query.ExecuteNonQuery();
                }
            }
        }
    }
}

/// <summary>
/// In order to execute groups of commands that can be executed together (not all can),
/// we separate them by the GO command into distinct, executable sets.  This assumes
/// that the scripts have been generated by Tasks->Generate Scripts in SSMS but will
/// also work if each script only contains commands that can be grouped together.
/// </summary>
/// <param name="commands"></param>
/// <returns></returns>
private static IEnumerable<IEnumerable<string>> CreateBatches(IEnumerable<string> commands)
{
    var batch = new List<string>();
    foreach (var command in commands)
    {
        if (_startsBatch(command))
        {
            if (batch.Any())
            {
                yield return batch;
                batch = new List<string>();
            }
        }
        else if (!string.IsNullOrWhiteSpace(command))
        {
            batch.Add(command);
        }
    }

    if (batch.Any())
    {
        yield return batch;
    }
}
Now our TestContextBase is set up to (re)create our database afresh for each test run. This will add a little bit of overhead to our tests runs, but we minimize this by doing it once and it is more than offset by our ability to do TDD on our Dapper-based repositories.
There are some obvious or apparently obvious opportunities for improvement in the above. First we could maintain many of the settings in a configuration file rather than in code. I chose not to do this because this is test code. It’s really only changing when I’m actively developing the code (and tests) so maintaining changeable settings in a configuration actually doesn’t save us much since we’re modifying the code already.

Another, probably better, opportunity would be to invest in naming conventions – we already have a start on this with the names we’ve used – and standardized locations for our scripts. This would enable use to enumerate them in code at runtime rather than hard-code them in our test context base class. This is something that I would probably do but in a simple case like this I don’t find it necessary. Clearly, we would want to refactor this as the database schema grows.

This wraps up the set up portion of this series. In the next post, I’ll start doing the development of the repositories using TDD based on what we’ve developed so far.