Saturday, December 27, 2014

Converting a Legacy LINQ-to-SQL Data Context to Entity Framework

Recently I worked with a client to upgrade their web application to use the latest technology available for ASP.NET MVC. This included upgrading the .NET framework, upgrading existing NuGet packages (and using packages to replace libraries tracked in SCM), and replacing custom code that duplicated current framework features. The largest change, however, was to upgrade the data access layer to use a combination of Entity Framework and Dapper instead of LINQ-to-SQL (L2S). Because this is client code I'm only including a few isolated snippets and not a link to the repository in this article.

While there are many similarities between L2S's DataContext and Entity Framework's DbContext implementations, there are enough gotchas that it became more of an ordeal than I had anticipated.

Reverse Engineering the Model

I started out by using the Entity Framework Power Tools to reverse engineer the existing database. I did this into a separate solution to avoid conflicts with the existing model classes and data context. Once you've added the power tools you will get an Entity Framework option with some new sub-options when you right-click on a project. I used the Entity Framework -> Reverse Engineer Code First power tool to generate the model classes, fluent mappings, and DbContext implementation.

Once I had the model reverse engineered, I manually adapted the reverse engineered classes to reflect the changes made to the L2S model over time.  This included adding some short cut properties – say to find the first item in a collection – and convenience methods.  I’ll note here that my goal was to do an “in-place” upgrade, not refactor the architecture. Even though there were places that refactoring would improve the code I felt that it was too risky to attempt much refactoring AND swap the underlying data access technology at the same time. There were only  a few places where there was some obvious code duplication that I ended up changing in this way; mostly I simply translated the code from the L2S classes to the EF classes. Because I was doing this in a separate solution this involved copying some code over from the existing solution where the existing models referenced types outside of the model.  Mostly these were enumerated types (enums) that were used for some model properties. By and large the model classes were well-isolated from the rest of the solution.  The fact that the previous developers followed good principles made the transition much easier.

Even so, there were a few areas of “gotchas” where the design choices in EF didn’t match those in L2S that caused me problems.

Configuring Relations

This was probably the largest area of difficulty that I faced. There were 3 distinct types of mapping issues that I had with relations. First, not all of the relations in the previous model had been mapped. In those cases I simply removed the mapped relation from the reverse-engineered models. Since my goal was to replicate existing behavior this seemed like the obvious choice. Later, when we start doing optimizations on the EF code, some of these may be added back if needed to improve query performance.

Second, some relations were mapped as one-to-many using the EF tooling that had been mapped one-to-one in L2S. These were trickier to deal with because of some missing capabilities in the fluent mapping methods.  With a one-to-many mapping, the fluent EF tooling allows you to include both the navigation property and the column housing the foreign key in the model. That’s not true – or at least I could not find a way to do it – for one-to-one/zero. In those cases, the methods that allow you to specify the foreign key correlation to an model property do not exist and if you try to include the FK property, you get a column exists conflict on the model.
I handled these in two ways. If the FK property wasn’t used outside the model class, I defaulted to removing the FK property and setting the property up as one-to-one/zero in the model using WithRequired(t => t.NavProperty).WithOptional(t => t.InverseNavProperty), specifying a custom key as necessary with .Map().  If the FK property was used outside the class – which you might do in order to avoid a subquery to get the related data – I left the property mapped one-to-many and added a convenience property to find the FirstOrDefault() in the collection to replace the existing one-to-one/zero mapping. This necessitated some code changes to set up the relations but these seemed straightforward and involved the least amount of disruption to existing code.  This introduced some properties that needed to be ignored explicitly on the model, but more on that later.

Lastly, I ran into some issues with many-to-many mappings. These included many-to-many mappings that did not follow the Entity Framework convention for foreign keys. Entity Framework expects foreign keys on join tables and join tables that included extra data. These seemed particularly difficult to deal with, though I came to find eventually that what I thought were join table mapping issues were really caused by some navigation properties that had not been ignored.   L2S uses a paradigm where the mappings have to be manually added to the model. EF uses a convention-based mechanism where it applies a default mapping unless you’ve explicitly overridden or ignored it.  This mismatch caused a lot of confusion at first as the reverse engineering tool had set up all of the mappings. In the process of resolving the issues above I had inadvertently introduced some model errors with the convenience properties.

With respect to those places where the keys did not match the expected conventions, I ended up explicitly mapping the foreign keys using .Map() and specifying the appropriate key values. This also involved some code changes where I had to remove the join table model that had existed in the L2S model since L2S did not handle many-to-many mappings directly but only as bidirectional one-to-many mappings from each side to the join table.  When there was additional data in the join table, I left the join table model in the solution and set it up as the bidirectional one-to-many association.

Sample Many-to-Many Modeled as Two One-to-Many Relations

public class UserClientMap : EntityTypeConfiguration<UserClient>
{
    // model simplified

    HasKey(t => t.Id);

    Property(t => t.Id).HasColumn("Id");
    Property(t => t.IsPrimaryContact).HasColumn("IsPrimaryContact");
    Property(t => t.UserId).HasColumn("UserId");
    Property(t => t.ClientId).HasColumn("ClientId");

    HasRequired(t => t.Client)
        .WithMany(t => t.UserClients)
        .HasForeignKey(t => t.ClientId);

    HasRequired(t => t.User)
        .WithMany(t => t.UserClients)
        .HasForeignKey(t => t.UserId);
}

Excluding Properties


The existing model included some properties that were not persisted. I had made an initial assumption that if these weren't mapped in the fluent mappings, they would be automatically excluded since all of the properties in the database had been mapped in the reverse engineered mapping files. This turned out to be a false assumption; I haven't used the fluent mappings much and haven't internalized their idiosyncrasies. I had also made the assumption that properties without setters, the convenience properties from above, would also be ignored. That also turned out to be false.

I began to set up the properties causing model errors manually using .Ignore(). This would work, but given the size of the model my laziness prevailed and I tried to find a way that I could use conventions that would allow me to use less code per mapping class and help prevent errors caused by failing to update the mapping file when a model changed to add a property that should be ignored.

I landed on an extension method to the EntityTypeConfiguration class coupled with the use of NotMappedAttribute found in the System.ComponentModel.DataAnnotations.Schema namespace.

The extension class has a method, ExludeIgnoredProperties, that enforces two conventions by adding an Ignore() for the property if it contains no setter method or if it is marked with the NotMappedAttribute. Each mapping class adds a call to this.ExcludeIgnoredProperties(); and both of these conventions are enforced for this model. Because each property has a different type I ended up using "dynamic" to allow the correctly typed lambda expression to be returned when constructing the argument for the Ignore() method.
internal static class EntitytTypeConfigurationExtensions
{
    public static void ExcludeIgnoredProperties<T>(this EntityTypeConfiguration<T> configuration)
        where T : class
    {
        foreach (var expression in GetIgnoredProperties<T>())
        {
            configuration.Ignore(expression);
        }
    }

    private static IEnumerable<dynamic> GetIgnoredProperties<T>()
        where T : class
    {
        var type = typeof(T);
        var properties = type.GetProperties(BindingFlags.Instance | BindingFlags.Public);


        foreach (var expression in GetPropertiesWithoutSetters<T>(properties))
        {
            yield return expression;
        }

        foreach (var expression in GetExcludedProperties<T>(properties))
        {
            yield return expression;
        }
    }

    private static IEnumerable<dynamic> GetPropertiesWithoutSetters<T>(IEnumerable<PropertyInfo> properties)
        where T : class
    {
        var type = typeof(T);
        foreach (var propertyInfo in properties)
        {
            var setter = propertyInfo.GetSetMethod();
            if (setter == null)
            {
                var parameter = Expression.Parameter(type, "t");
                var body = Expression.Property(parameter, propertyInfo);
                var lambda = Expression.Lambda(body, parameter);
                yield return lambda;
            }
        }
    }

    private static IEnumerable<dynamic> GetExcludedProperties<T>(IEnumerable<PropertyInfo> properties)
        where T : class
    {
        var type = typeof(T);
        foreach (var propertyInfo in properties)
        {
            var ignoreAttribute = propertyInfo.GetCustomAttribute<NotMappedAttribute>();
            if (ignoreAttribute != null)
            {
                var parameter = Expression.Parameter(type, "t");
                var body = Expression.Property(parameter, propertyInfo);
                var lambda = Expression.Lambda(body, parameter);
                yield return lambda;
            }
        }
    }
}

Replicating Default LoadWith Behavior


The last major challenge I had was to replicate L2S's default LoadWith behavior. In L2S, you need to set up all of the included properties before making any queries using the data context. If you don't an InvalidOperationException is thrown. As a result, the repository implementation defined some properties that automatically got loaded with each model type when the data context was instantiated.

Entity Framework is much more flexible and allows the included relations to be defined per-query, however, I wanted to retain as much of the existing code and behavior as I could. In particular, I needed to retain as much of the efficiency of using included relations as I could without having to re-visit every query performed against the repository

This is probably the area in which my in-place upgrade resulted in the most compromises from what I would normally have implemented. Generally I use a repository per root pattern, but the existing repository was a single generic repository and the "roots" where defined via generic constraints on the model type. Each "root" model type extended an abstract (empty) Root model class and the constraints only allowed a Root class to be the base of a query.

Using a repository-per-root would have allowed me to to easily defined which related entities were loaded by default for each class. With a single, generic repository I would need to use some reflection to achieve the same effect. In addition I wanted to be able to easily override the defaults for specific queries to make them more efficient as needed. Lastly, I wanted to avoid using if/else or switch statements to define the logic. In my mind the included properties should be more closely associated with the model than the repository so I ended up opting for attribute-based definition of the included relations.

LoadWithAttribute

[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)]
public class LoadWithAttribute : Attribute
{
    internal IEnumerable<string> Includes { get; private set; }

    private string _include;

    public string Include
    {
        get { return _include; }
        set
        {
            _include = value;
            Includes = (value ?? "").Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
            .Select(s => s.Trim());
        }
    }

    public LoadWithAttribute(string include)
    {
        Include = include;
    }
}

This attribute is added to each class that needs to have default included properties defined. It takes a comma-separated string of property names for the related data to include. In the repository, the attribute is obtained by reflection but only if the developer doesn't supply a set of included properties specific to that query. Note that the signatures include both one without any included properties and one that takes a number of property names via a params. Both of these defer the behavior to an implementation method that takes an IEnumerable of property names. There's also a signature (not shown) that takes, as params, any number of Func parameters if the developer wants to specify using strongly-typed properties rather than property names.

Sample Repository Methods

public class Repository : IRepository
{
    // class simplified
    private static readonly ConcurrentDictionary<Type, IEnumerable<string>> _defaultIncludeMap = new ConcurrentDictionary<Type, IEnumerable<string>>();

    public IQueryable<T> All<T>(Expression<Func<T, bool>> filter)
        where T : Root, new()
    {
        return AllImplementation(filter, GetDefaultIncludes<T>());
    }

    public IQueryable<T> All<T>(Expression<Func<T, bool>> filter, params string[] includedProperties)
        where T : Root, new()
    {
        return AllImplementation(filter, includedProperties);
    }


    private IQueryable<T> AllImplementation<T>(Expression<Func<T, bool>> filter, IEnumerable<string> includedProperties)
        where T : Root, new()
    {
        IQueryable<T> dbSet = _context.Set<T>();

        foreach (var property in includedProperties)
        {
            dbSet = dbSet.Include(property);
        }

        return dbSet.Where(filter);
    }

    private static IEnumerable<string> GetDefaultIncludes<T>()
    {
        var includes = _defaultIncludeMap.GetOrAdd(typeof(T), type =>
        {
            var attribute = type.GetCustomAttribute<LoadWithAttribute>();

            return attribute == null
                ? new List<string>()
                : attribute.Includes;
        });

        return includes;
    }
}
Now, with these things in place I only needed to exercise the code and find the places where L2S supported constructs were used that don't have translations to EF queries. I'm still working through this exercise, performing query optimizations as I can to take advantage of the greater flexibility that EF offers. Fortunately these are relatively easy to spot (they throw exceptions) and, so far, have been relatively easy to find alternative constructs for.

Thursday, November 6, 2014

Action Verification Patterns

I was reminded again about a verification pattern that I use frequently when doing TDD/unit tests. It's really two different patterns. The first when you want to ensure that some action occurs, the other when you want to make sure it hasn't.

My examples will be specific to FakeItEasy, but you should be able to extrapolate them to any mocking framework.

The use case for my specific example is the case when you're creating a new entity via WebAPI and you may be connecting that entity to an existing related entity or creating a new entity for the other end of the relationship as well. For example, you want to sign up a new user but may be associating that new user with an existing company or, if there is no existing company, create a new one at the same time as you create the user. In this case, among other tests, you'll want to test that the new company either was or was not created as appropriate.

The patterns come into play in the Assert step of the test. With the mocking framework you ensure that the call across the system boundary, in this case to the database, either did or did not happen. When testing that call was made, use the verification signature that specifies exactly which entity was to have been added.

For example,

A.CallTo(() => _testContext.MockCompanies
                           .Add(A<Company>.That
                                          .Matches(c => c.Name == expectedCompanyName))
 .MustHaveHappened();


The other pattern, when you want to exclude an action, is similar except you use the verification signature that accepts any matching entity.

A.CallTo(() => _testContext.MockCompanies.Add(A<Company>.Ignored))
 .MustNotHaveHappened();


In the first, case you make sure that the exact expected instance has really been added to the database. In the second, you ensure that no entity of that type has been added to the database. By matching on all entities in the second case you avoid an over-specification in which an error in the spec might give you a false positive by failing to match a call because of an error in the condition.

Wednesday, October 29, 2014

Autofac Extensions: It’s the Little Things

I get tired of reading, and writing, dependency registrations in Autofac. Very descriptive, but also very repetitive. Imagine a large application with a couple of dozen small, focused services each with a registration like:
builder.RegisterType<DescriptiveService>()
       .As<IDescriptiveService>()
       .InstancePerHttpRequest();

It's not a pretty sight.

To make this easy to read and, especially with syntax highlighting, relatively easy to spot errors I created a small extension method as part of my dependency configuration class.

public static class DependencyConfig
{
    private static void RegisterPerRequest<TImpl, T>(this ContainerBuilder builder)
        where TImpl : class, T
    {
         builder.RegisterType<TImpl>()
                .As<T>()
                .InstancePerHttpRequest();
    }

    public static void Configure(Action<IDependencyResolver> setResolver)
    {
         var builder = new ContainerBuilder();
         builder.RegisterPerRequest<DescriptiveService,IDescriptiveService>();

         // ...
    }
}


I'll grant you that it works better in a wide window or with shorter class names but I'm happy to not be repeating so much code.

Saturday, May 3, 2014

Introducing TelegraphJS

Long ago, on a very different internet, we used IFrames extensively to layout content where that content might come from many different components on the same system. IFrames are still used but have a very different purpose these days, frequently to include code from various different servers into a seamless experience for the user rather than simply layout various components from the same system.

Recently on a project I had a need to use IFrames to provide the ability to embed and interact with content from a different server on a web page. The basic idea is that by including some JavaScript on the page, you create an IFrame “gadget” whose source lives on a different server and a button which allows you to open and close the “gadget” both from a button on the parent page and from inside the “gadget.”  I also had a need to trigger some complex behavior within the “gadget” based on the URL parameters found on the request to the parent window.  To do this I developed some custom JavaScript that uses postMessage to communicate between the parent window and the IFrame.

After doing this I decided to abstract that code an make a more generalizable framework, which I’ve called TelegraphJS, that makes it relatively easy to communicate between windows and trigger actions in the other window via callbacks.  Partly this was an exercise in creating a module that can be optionally used with or without RequireJS, partly it was because I hadn’t found anything existing that did this. I confess that I didn’t look too hard.

The basic idea is that you would import the telegraph module (or the the globally available Telegraph module) to create a new message handler for a specific window.  You would optionally register a set of handlers for events (messages) that get passed to the window. Events consist of a unique string, I used the convention “APP:msg” for my events, and a data object for additional information.

Simple Example

This is a simple example using the global Telegraph module.  An example using RequireJS can be found at https://github.com/tvanfosson/telegraphjs. This bit creates the parent end of a connection to an IFrame with id create-frame, sets up a handler to receive APP:opened messages from the IFrame (well, really any IFrame since we haven’t added any security). It then sends an APP:open message to the IFrame. It's really only intended to demonstrate the syntax (and I haven't tried it). To see how it really works, look at the RequireJS example on GitHub.
I’ll note here that on initialization you may need to repeatedly send any start up messages as the IFrame’s handlers may not yet have been applied. In my use I ended up setting up an interval timer that sent startup up messages until an acknowledgement was received from the child IFRAME. My examples omit the retry behavior.
var iframe = document.getElementById('client-frame');
var contentWindow = iframe.contentWindow;

var handlers = {
    'APP:opened' : function(e, data) {
        alert('opened');
    }
};

var telegraph = new Telegraph(contentWindow, handlers);

telegraph.send('APP:open');
The client end of this code is set up the reverse. It receives the APP:open message and then sends back the APP:opened message.
var telegraph;
var handlers = {
    'APP:open' : function(e, data) {
        telegraph.send('APP:opened');
    }
};

telegraph = new Telegraph(window.parent, handlers);

Additional Features

The code today is pretty simple, but it also includes the ability to add and remove handlers via on/off similar to how jQuery event handlers are added.  At present it doesn’t accept space-separated events but that’s on my TODO list.

Monday, April 28, 2014

TDD and Me

It will come as no surprise to anyone who knows me but I have a bit of a thing for Test Driven Development (TDD).  Recently @dhh stirred the developer waters with a provocative post, titled TDD is dead. Long live testing. That set off a firestorm on Twitter (what doesn’t these days) which engendered a response from @unclebobmartin, titled Monogamous TDD. Leaving the rhetoric about fundamentalism, from both, aside, I’m more inclined to agree with the latter. I’ve found TDD to be a practice which does, as Uncle Bob says, help me:
  1. Spend less time debugging
  2. Document the system at the lowest level in an “accurate, precise, and unambiguous” way.
  3. Forces me to write higher quality, decoupled code.
I’ve even, occasionally, gotten enough confidence from my unit tests that I do deploy without fear based on passing tests and do feel the confidence to refactor that Uncle Bob describes.

But Uncle Bob left out one, perhaps glaringly obvious and important, benefit that TDD provides that other testing strategies don’t. I suspect this because both DHH and Uncle Bob agree on the need for tests.

TDD, because it insists on writing the tests first, ensures that the tests are written at all.

My fear, based on my experience with other developers, is that despite DHH’s careful phrasing, most developers will only hear “TDD is dead.” To many, if not most, developers the question isn’t, “Do we write the tests first or last?” but “Do we write tests at all?” In my experience, TDD was the practice that gave me enough of a reason to write the tests.

Though I would agree that tests were necessary and led to higher quality code, I always found writing tests afterwards something that I would rarely, if ever, do. Why? because the code “obviously worked.” Never mind the fact that it actually probably didn’t, or at least, didn’t always work. Even now, say when I’ve written some experimental code to try and figure out how something works, I find that when I go to write the production code I discover things about the experimental code that were broken when I write the tests (first) for the production version.

Even within the last week I’ve had conversations with developers who insist that TDD slows you down and “customers don’t pay us to write tests.” While true in part, these sorts of myths are drive people to omit tests entirely, not test last or test using alternative methods. I fear that they will only be reinforced by DHH’s post.

To borrow a concept, Shu Ha Ri, from Alistair Cockburn, I feel like the “conversation” between DHH and Uncle Bob is taking place between two people at the Ri, or fluent stage that will be misinterpreted by the majority of us in the Shu, or follower, stage. Because the practices at this stage are still being ingrained and can take effort to master, they are easily abandoned when another “expert” weighs in and discredits them. In our sound-bite culture, this is especially pernicious as DHH isn’t advocating the abandonment of testing (which I think many will hear) but an alternative practice of testing.

My Practice

Or How I Make TDD Work for Me
First and foremost I consider myself to be committed to high quality code. Secondarily, I consider myself to be a pragmatist. When I started doing TDD, I tested everything. I also found myself writing prescriptive, rather than descriptive tests. I can still tend to do this if I’m not careful.

What do I mean by that? I mean that the tests were describing how things got done rather than what was being done. I think this is one of the key pitfalls of poorly done TDD and leads to tests being brittle and hard to maintain (though it's true about all tests, it's just that - mostly - those other tests are never written). Since this is one of the common criticisms of TDD, I think it's important to note that it's also something that needs to change if you're going to be successful at TDD. These days I try to write tests that are the least constraining as possible and define behavior, not implementations as much as possible.

I also try to only test meaningful code that I write. "Meaningful" is a tricky concept as sometimes I find myself going back and adding a test that I didn't think initially was meaningful...until it broke in QA. Logging is one of those things that I don't often test unless there are specific requirements around it. IOC container configuration is another. YMMV. "Code I write" is much easier. I avoid testing that the framework or language works. So, for example, no testing of C# properties (i.e., getter/setter methods) unless there is complex logic in it.

The other thing that was crippling to me when I first started with TDD, was copy-paste disease or "DRY-rot". By that I mean that I often simply copy-pasted test set-up into each test rather than treating my test code as a first-class entity and applying the same craftsmanship principles to it that I did to the production code. It seems obvious in retrospect, but this was test code, not production code (or so my thinking went). Of course, when things changed that led to a cascading chain of test changes. Again, this is one of the key complaints I hear about TDD (and again all tests).

Now I make sure to refactor my tests to keep my code DRY and employ good architectural principles in my test code as well as my code under test. In my mind it's all production code. Refactoring to DRY code, just like in your code under test, makes changing the code much less painful and easier to maintain.

I'm also frequently confronted with code that hasn't been written with TDD or whose tests haven't been maintained. As a pragmatist, I don't find it valuable to go back and write tests for existing code, especially when I'm first coming to a project and don't have a complete understanding of how that code is supposed to work.

Aside: the lack of tests makes it harder to understand such code since there isn't an "executable specification" that shows me how the code differs from the, typically, out-of-date or non-existent documentation. At best you can get an incomplete oral history from the previous developer, but even they may not be the original author and only have a limited understanding.

If I'm implementing a new feature I will write tests around that feature and refactor as necessary to support testability in the feature. In many cases this means introducing either wrappers around hard to test (read: static) implementations or alternative constructors with suitable default implementations which allow me to introduce dependency injection where it wasn't before. Less frequently, while changing features, I'll write enough tests around the feature to help ensure that I'm not breaking anything. This is easier when there are exiting albeit out-of-date tests as at least the architecture supports testability. As a last resort I do manual testing.

Almost invariably, if TDD hasn't been used, there are no or only minimal developer tests. I don't count QA scripts as tests though they serve a valuable purpose, they don't given the developer enough feedback quickly enough to ensure safety during development. This is almost universal in my experience and leads to the fears I've detailed above.

The last thing I'll mention is, as I alluded to before, I don't typically write unit tests for experiments or spikes. If I do end up using that code as the basis for production code, I try to follow the practice of commenting out the pieces of it and writing the tests as if the code did not exist and then uncommenting the relevant bit of code to pass the test. Invariably, I find that my experimental code was broken in some significant way when I do this so I try to keep it to a minimum.

The Need for Speed

I want to say a word about one the common complaints that I hear about TDD: that TDD slows development. I think that this is both true (and good) and false (and still good). First, yes, writing the tests in addition to writing the code, particularly before you write the code can add additional effort to the task. If you otherwise don't write the test at all, by doing TDD you've incurred the entire cost of writing the test. If you otherwise write the test after, by doing TDD you've incurred any refactoring you've done while writing the code.

So, yes, doing TDD can take more time per unit of code under test than either not testing or writing tests after. BUT... that time isn't necessarily wasted because it also gives you more time and the impetus to think about the code you are going to write before you write it. In my experience, this leads to better code the first time and less re-writing and can actually save time even during initial development, though it typically does increase it somewhat.

Over the long term, I think the idea that TDD takes more time is patently false. There is a body of research into the topic that shows that testing, and specifically TDD, reduces code defects and re-work significantly (cf, Is There Hard Evidence of the ROI of Unit Testing).

Ancedotally, I've also found that on projects where I've used TDD and other developers haven't tested at all, we've spent a lot more effort around the parts of the code that haven't been tested. While I'd like to take credit for my superior development skills, the reality is that I just prevented a lot of the defects that I otherwise would have made using TDD. When I'm not able to write tests, I make those same errors.

On projects where they have unit tests, I've found that when I come in later, I'm much more quickly able to get up to speed and I'm able to develop with more confidence, increasing my productivity, when I have the safety of a unit test suite. Again, in my experience, if you don't do TDD, you probably also don't do testing, period. At the very least, if you wait and write the tests afterwards, you're still incurring the costs of finding and fixing defects while the code is under development that TDD could have saved you.

Always a Newb

Finally, while I feel like I've learned a lot about testing, in general, and TDD, in particular, over the years by no means do I consider myself as knowing everything. I'm still discovering how to make it work better. I don't pretend that there aren't alternative ways that might work as well. Based on my experience, it's the best thing that I've found and I heartily recommend it to anyone who asks. I will happily abandon TDD for any practice that works better. So far I haven't found one.

I will say, though, that I know enough to be able to safely say that if you're not doing tests, unit or otherwise, don't bother weighing in on whether TDD or some other means of test is better...just start testing. If you are testing but haven't tried TDD, give it a chance. Study the pitfalls (maybe start with TDD, Where Did It All Go Wrong (Ian Cooper) and avoid them but don't blindly dismiss it based on someone else's experience because there are plenty of us that have a different, markedly better one.

Edited to correct misspellings and grammar. Now, if I could only find a way of doing TDD on my blog posts.

Saturday, March 1, 2014

Runtime-typed Generic Collection Extensions

Another post inspired by a Stack Overflow question (see my answer).

The specific situation encountered by the asker of the question was prompted by a situation in which the asker knew the specific type that was being returned, but only at runtime. The asker wanted to be able to invoke a method delegate that accepted a IEnumerable<Foo> or IEnumerable<Bar> that was, presumably, passed as an argument along with the type, Foo or Bar, to a framework method that is unaware of the specific type at compile type. The problem is that when the ToList() method is used, it was returning List<object> instead of List<Foo> or List<Bar> as required as required by the delegate that the asker was attempting to invoke. This resulted in an ArgumentException for the delegate's Invoke method as the underlying type was not convertible.

While I suspect that there is probably a better way to create the framework there wasn’t enough information in the question for me to comment on that. Instead I decided to try and construct some extension methods that would produce a collection of the specific type that could be used by the delegate using ideas from the accepted answer and an answer on a similar question by the indubitable Jon Skeet.

To provide a test case for the solution I mocked up some classes similar to those in the question. First, there is a Dog, which knows how to Bark(), and a Bird, which knows how to Sing().
public class Dog
{
    private readonly int _id;

    public Dog(int id)
    {
        _id = id;
    }

    public string Bark()
    {
        return string.Format("Woof...{0}", _id);
    }
}

public class Bird
{
    private readonly int _id;

    public Bird(int id)
    {
        _id = id;
    }

    public string Sing()
    {
        return string.Format("Squawk...{0}", _id);
    }
}
Then we have a Framework class that does a query that returns a collection of IEntry objects, each being an Entry object with a Data property that is either a Dog or a Bird.
public class Framework
{
    public IEnumerable<IEntry> QueryOfAllType(Type type)
    {
        var range = Enumerable.Range(0, 10);

        if (type.IsAssignableFrom(typeof(Bird)))
        {
            return range.Select(i => new Entry
                        {
                            Data = new Bird(i)
                        })
                        .ToList();
        }

        return range.Select(i => new Entry
                    {
                        Data = new Dog(i)
                    })
                    .ToList();
    }
}

public interface IEntry
{
    object Data { get; }
}

public class Entry : IEntry
{
    public object Data { get; set; }
} 
Lastly, we have our program which will query the Framework for IEntry objects of the proper type, select the Data property, and use the new extensions to convert the collection to a collection of the proper type to be used by a delegate to perform the appropriate action for that object. Note: the code below has been updated from the previous version to more accurately replicate the original problem - that is, that the type information can't be inferred from the delegate or the invoking method.
class Program
{
    static void Main(string[] args)
    {
        DoThatThing(typeof(Bird) , Vocalize);
        DoThatThing(typeof(Dog), Vocalize);
    }

    private static void DoThatThing(Type type, Action<IEnumerable> thingToDo)
    {
        var framework = new Framework();

        var result = framework.QueryOfAllType(type)
                              .Select(e => e.Data)
                              .ToListOfType(type);

        thingToDo.DynamicInvoke(new [] { result });

    }

    private static void Vocalize(IEnumerable animals)
    {
        foreach (var animal in animals)
        {
            if (animal is Dog)
            {
               Console.WriteLine(((Dog)animal).Bark());
            }
            else if (animal is Bird)
            {
                Console.WriteLine(((Bird)animal).Sing());
            }
        }
    }
}
Below is my solution, using reflection to cast the returned collection to the appropriate type and convert the IEnumerable to a list, retaining the behavior that the actual collection rather than an iterator is produced.
public static class EnumerableExtensions
{
    private static readonly Type _enumerableType = typeof(Enumerable);

    public static IEnumerable CastAsType(this IEnumerable source, Type targetType)
    {
        if (source == null)
        {
            throw new ArgumentNullException("source");
        }

        var castMethod = _enumerableType.GetMethod("Cast").MakeGenericMethod(targetType);

        return (IEnumerable)castMethod.Invoke(null, new object[] { source });
    } 

    public static IList ToListOfType(this IEnumerable source, Type targetType)
    {
        var enumerable = CastAsType(source, targetType);

        var listMethod = _enumerableType.GetMethod("ToList").MakeGenericMethod(targetType);

        try
        {
            return (IList)listMethod.Invoke(null, new object[] { enumerable });
        }
        catch (TargetInvocationException e)
        {
            ExceptionDispatchInfo.Capture(e.InnerException).Throw();
            return null; // to satisfy the compiler, never reached
        }
    } 
}
This is what I do for fun on Saturdays. The extension code and tests are available on GitHub.

Wednesday, February 26, 2014

These Aren’t The API Models You’re Looking For

I recently answered a question on Stack Overflow that started me thinking about the differences between domain models and the models you expose through your Web API. While the question was only tangentially related, I thought it would be helpful to think through why I believe that domain models shouldn’t be exposed directly through your API.

I’ve long argued for model-per-view in MVC and I’ve carried that thinking over to WebAPI as well. In an MVC context, I think that separation of concerns is the strongest argument in favor of using separate view models. Views have specific data needs for display that don’t belong in your domain models. For example, they may contain auxiliary information containing localization or paging. Your views may not need or you may not desire your view models (which can also used as the arguments for POST actions) to contain all domain properties – for example, whether a user is in a role. Or you may expose this as part of the model but validate it in a completely different way, leading to differing validation requirements. Lastly, your views may flatten multiple models into a single, composite model, omitting unimportant details from that particular view’s point of view.

Many or most of these also apply to API models, even though they are or seem to be strictly about the data. While I don't claim the following to be an exhaustive list, here the reasons I think that you should use separate models for your API and map between your domain and your API models rather than directly expose your domain models.

Hiding Implementation Details

There are two reasons why you might want to hide the implementation details of your model. As in MVC, there are likely some record-keeping attributes on your domain models that are not necessary for the proper operation of the API. For example, you may track who created the item and when, who updated it and when - in a HIPPA world you may even track who viewed it, when, and, perhaps, for what purpose. You may also have actual domain information that needs to remain private even if some of the information is exposed; or it may only conditionally be available based on the API consumer's privileges. This, in itself, is probably enough of a reason to separate your API models from your domain models. Doing this allows you to keep your domain model simpler and cleaner, more appropriate to the data rather than the usage of the data. The alternative would, perhaps, be a multiplication of domain models, representing purpose-specific views and an accompanying complication of your API (or additional APIs) to support those models.

Transforming Implementation Details

Similarly, there are aspects of your model to represent relationships in the data that may be irrelevant to the API consumer. You may model a relationship as many-to-many in your domain model. For example, a product may have many accessories while an accessory may be available for more than one product. From a catalog management perspective this makes perfect sense, but for an API that allows an external consumer to access your catalog you may only want to expose it as a one-to-many relationship. This product has these accessories and not expose that the accessories belong to more than one product. You may not even want to model the reverse relationship from accessory to product in the API at all. It might simply be immaterial.

Also you may want to provide localization for product attributes so that your API can be language aware. While your domain model must account for the relationships to alternative text for attributes, you're likely to perform substitution for those attributes appropriate to the language requested rather than provide all translations, forcing the consumer to pick the correct one for their purpose.

Navigation

Navigation in your domain model is established through foreign key relationships in the database, which are translated to object references and collections of object references in your object model. Navigation in APIs is represented by URLs. "RESTful", "REST-like", "Hypermedia" - that's a discussion for a different post - but modern thinking is that your API on the web should leverage web protocols (the verbs and request patterns). This can be a challenge in designing your API as you trade off between model purity and efficiency.

In your object model the cost of navigation is very low (an object reference) but on the web it is exponentially higher - a full web request. The thing that's clear, though, unless your data set is very small, is that you will have to make compromises to keep your requests from ballooning in size. While you may be able to keep your entire product catalog in memory, it's not likely that you're going to want to serialize all of it and deliver it in a single web request. Instead, you're going to break your API up into multiple methods, each perhaps corresponding to a aggregate root. You may choose to expose navigation between related elements via explicit navigation properties or simply ids that the client uses in constucting the appropriate URLs to request more information.

If you embrace web protocols in your API, you'll need to account for the differences in navigation properties between your object model and your API.

Versioning

Another strong argument in favor of using separate view models is versioning. Because APIs represent a "public" product, tying disparate systems together, it's rarely possible to simply discard an older API when the underlying domain model is modified. In response API designers need to account for this by introducing some sort of versioning into their API design. If you directly expose your domain models, this could also mean that you would need to introduce versioning into your domain model to support older versions of an API. Decoupling your domain models from your API through the use of purpose-specific API models insulates your domain model from this need. While you may need to main multiple versions of your API models to support multiple versions of your API, your domain model can evolve to support new features without compromise for the sake of the API.

Summary

In short, you should use separate API models. Yes, it might seem simpler to directly use your domain models and there are a lot of examples that take this expedient. In the long run and with a more complex API than typically used for tutorials, separate API models are going to serve you well.

Saturday, February 22, 2014

Unit testing JsonResult with Nested Anonymous Types

Recently I had a need to test some controller actions that returned nested anonymous types. Since the Data property on the JsonResult is of type object this can involve jumping through some hoops with reflection to get access to the properties in the anonymous types.  My first attempt was to simply serialize the object, then deserialize it to an anonymous type. This worked, but I found that my nested types were JObjects and I needed to use cast and convert these to the actual type.
public class TestContext
{
    // remainder omitted
    
    public dynamic ToDynamicObject(object obj, object anonymousTemplate)
    {
        var serialized = JsonConvert.SerializeObject(obj);
        return JsonConvert.DeserializeAnonymousType(serialized, anonymousTemplate);
    }
}
Used as
var model = _c.ToDynamicObject(result.Data, new { success = true, data = new SubModel() });

var subModel = ((JObject)model.data).ToObject<SubModel>();
I found one solution at http://blog.m0sa.net/2011/02/mvc-unit-testing-jsonresult-actions.html which seemed to improve upon this by using a wrapper class that implements DynamicObject. This got me very close to what I wanted except, again, the nested anonymous objects themselves had to be wrapped individually. Here using my initial implementation of an extension method that used it.
var model = result.Data.AsDynamic();

var cardModel = ((object)model.data).AsDynamic() as SubModel;
To fix this I added a bit of code to the wrapper class so that the result of TryGetMember was an anonymous type, it created and returned a DynamicObject wrapper around it. To test if the type was anonymous I used the code found at http://www.liensberger.it/web/blog/?p=191 (referenced from this Stack Overflow question, http://stackoverflow.com/questions/2483023/how-to-test-if-a-type-is-anonymous)

Here’s the final implementation as extensions to object.  Note: these are internal to my test project so the restriction to object doesn’t bother me too much. JsonResult.Data has that type and that was the problem I was trying to solve.
internal static class ObjectExtensions
{
    public class DynamicWrapper : DynamicObject
    {
        private readonly Type _subjectType;
        private readonly object _subject;

        public static dynamic Create(object subject)
        {
            return new DynamicWrapper(subject);
        }

        private DynamicWrapper(object subject)
        {
            _subject = subject;
            _subjectType = subject.GetType();
        }

        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            try
            {
                var propertyInfo = _subjectType.GetProperty(binder.Name);

                var getter = propertyInfo.GetGetMethod();

                result = getter.Invoke(_subject, null);

                if (result.IsAnonymous())
                {
                    result = Create(result);
                }
                return true;
            }
            catch
            {
                result = null;
                return false;
            }
        }
    }

    public static bool IsAnonymous(this object obj)
    {
        if (obj == null)
        {
            return false;
        }

        var type = obj.GetType();

        return Attribute.IsDefined(type, typeof(CompilerGeneratedAttribute), false)
                  && type.IsGenericType && type.Name.Contains("AnonymousType")
                  && (type.Name.StartsWith("<>") || type.Name.StartsWith("VB$"))
                  && (type.Attributes & TypeAttributes.NotPublic) == TypeAttributes.NotPublic;
    }

    public static dynamic AsDynamic(this object obj)
    {
        return DynamicWrapper.Create(obj);
    }
}
And the above code sample now becomes much cleaner and I am much happier:
var model = result.Data.AsDynamic();

var subModel = model.data as SubModel;

Wednesday, February 12, 2014

Putting It All Together: TDD for Dapper Repositories

In the last several posts I've set up the infrastructure for doing Test Driving Development for Dapper with lightweight integration tests. You might want to review these if you haven't already looked at them.


I'm going to keep it relatively simple for this example. Essentially my repositories are just going to implement CRUD operations for each aggregate root. There will be Create, Get and GetAll (Read), Update, and Delete methods. I’m also going to make the choice to have most exceptions bubble up. This might not be the best choice but it will keep the example simple. In fact, you might want to catch most exceptions and transform them into application-level exceptions which you’d handle at higher levels. Testing exceptional behavior is more difficult to set up in integration tests and we can more easily test exceptions with true unit tests in the classes that use our repositories.

Because these are actually integration tests we run into a chicken and egg issue. Unless we write some other (duplicate) code, we can't test Get() without Create() or Create() without Get() in a self-contained test method.
My solution is pragmatic, start with Get(). The logic is relatively simple and easier to convince yourself of it's correctness. You could write some code using a data reader to read and verify that something is created but you'd still have to convince yourself that code is correct. Since we're not so much testing Dapper - it already has tests - as we are our SQL, I prefer to start off with Get() and not write the extra code. As they say, "your mileage may vary," you might want to start with something else.

Note you won't be able to create tests for Get() until you've done Create(), but once you've created your Get() implementation and convinced yourself of its correctness, you can move onto Create() and use TDD to create that code. After you can create and retrieve an object, you're ready for GetAll(), Delete(), Update(), and any specialized methods. Do go back and create tests for Get(). These are after all, integration tests, we can write those after the fact. As we'll see that will come in handy later when we extend our Get() implementation to get some related data from another table.

We’re going to create two separate repositories, one for Users and one for Posts. Let's start with the DapperUserRepository first. All of the code referenced below can be found, though in it's final form, at https://github.com/tvanfosson/dapper-integration-testing

DapperUserRepository

First, we'll solve our "chicken and egg" problem by implementing Get().
public User Get(int id)
{
    const string sql = "SELECT * FROM [Users] WHERE [Id] = @userId";

    var user = Fetch(c => c.Query<User>(sql, new { userId = id }).SingleOrDefault());

    return user;
}
Now, if you’re paying attention, you should be saying “Wait a minute – that’s too much code there. You’re cheating.” Yes, I am.I’m showing you the results after some refactoring – refactoring done in another project.In another project I found it helpful to develop a base repository pattern. This base implementation provides some convenient methods for performing the Dapper Query and Execute operations. Below is the base class implementation for the repositories we'll create. Since I've tested this pattern in other projects, I feel comfortable starting with it as the basis for my new repositories.
public abstract class DapperRepositoryBase
{
    protected readonly IConnectionFactory _connectionFactory;
    protected readonly string _connectionStringName;

    protected DapperRepositoryBase(IConnectionFactory connectionFactory, string connectionStringName)
    {
        _connectionFactory = connectionFactory;
        _connectionStringName = connectionStringName;
    }

    protected DbConnection OpenConnection()
    {
        var connection = _connectionFactory.Create(_connectionStringName);
        connection.Open();
        return connection;
    }

    protected T Fetch<T>(Func<DbConnection, T> func)
    {
        if (func == null)
        {
            throw new ArgumentNullException("func");
        }

        using (var connection = OpenConnection())
        {
            return func(connection);
        }
    }

    protected int Execute(Func<DbConnection, int> func)
    {
        if (func == null)
        {
            throw new ArgumentNullException("func");
        }

        using (var connection = OpenConnection())
        {
            return func(connection);
        }
    }

    // remainder of class omitted
}
If you want to start without the base class implementation, your Get() implementation might look a little bigger, maybe like the following. For the rest of the examples, we'll use my base class and I hope it will be obvious where the refactorings came from.
public User Get2(int id)
{
    const string sql = "SELECT * FROM [Users] WHERE [Id] = @userId";

    using (var connection = _connectionFactory.Create(_connectionStringName))
    {
        connection.Open();

        var user = connection.Query<User>(sql, new { userId = id }).SingleOrDefault();

        return user;
    }
}
Now that we have (as we've convinced ourselves) a working Get() method, we can now turn to developing Create. First, we stub it in and have it throw a NotImplementedException.
public class DapperUserRepository : DapperRepositoryBase, IUserRepository
{
    public DapperUserRepository(IConnectionFactory connectionFactory, string connectionStringName)
        : base(connectionFactory, connectionStringName)
    {
    }

    public void Create(User user)
    {
        throw new NotImplementedException();
    }

    // Get omitted
}
Then, we'll create our first test. Typically I'll start with the success case first. In this case, we'll test that a User can be successfully created. We'll add some helper methods to the test context that allows us to get an instance of the repository and create a user with some pre-filled properties. Note that since we're using autoincrement ids, our create code will need to fill this value in when performing the insert.
public class DapperUserRepositoryTests : TestBase
{
    private DapperUserRepositoryTestContext _c;

    [TestMethod]
    public void When_a_new_user_is_created_the_data_is_inserted()
    {
        var repository = _c.GetRepository();
        var user = _c.CreateStandardUser();

        repository.Create(user);

        var retrievedUser = repository.Get(user.Id);

        Assert.AreEqual(user.DisplayName, retrievedUser.DisplayName);
        Assert.AreEqual(user.Email, retrievedUser.Email);
        Assert.AreEqual(user.Active, retrievedUser.Active);
    }
    
    // test context omitted
}
We run the test and it fails, as expected. Now let's provide the least code necessary, in an integration test context, to satisfy the test, that is, insert the record into the database so that when it's retrieved the stored values are matched by the retrieved values. We need to insert the DisplayName, Email, CreatedDate, and Active state.
public void Create(User user)
{
    const string sql = "INSERT INTO [Users] ([DisplayName], [Email], [CreatedDate], [Active]) OUTPUT inserted.[Id] VALUES (@displayName, @email, @createdDate, @active)";

    var id = Fetch(c => c.Query<int>(sql, new
    {
        displayName = user.DisplayName,
        email = user.Email,
        createdDate = DateTime.Now,
        active = user.Active
    }).Single());

    user.Id = id;
}
We run the test now and it passes. Next we might want to test that the created date is also updated when the user is created. This is a bit tricky because we can't simply compare dates between the CLR and SqlServer; since their resolutions are different we'll get spurious errors. One way around this is to take a snapshot of the time right before and right after the insert and make sure that the inserted date falls between those ranges within an acceptable bound - for me I only care about it being accurate to the nearest second. We'll use another helper method to round our times to the nearest second for comparison. Thus our next test is:
[TestMethod]
public void When_a_new_user_is_created_the_current_time_is_set_as_the_created_date()
{
    var repository = _c.GetRepository();
    var user = _c.CreateStandardUser();

    var before = _c.RoundToSecond(DateTime.Now);
    repository.Create(user);
    var after = _c.RoundToSecond(DateTime.Now);

    var retrievedUser = repository.Get(user.Id);

    var created = _c.RoundToSecond(retrievedUser.CreatedDate);

    // we only care about the nearest second
    Assert.IsTrue(before <= created && created <= after);
}
Running this test fails because we're not retrieving the CreatedDate. In this case I think it's easier to set the value on the object instead of storing it, so we'll implement it as follows.
public void Create(User user)
{
    var date = DateTime.Now;

    const string sql = "INSERT INTO [Users] ([DisplayName], [Email], [CreatedDate], [Active]) OUTPUT inserted.[Id] VALUES (@displayName, @email, @createdDate, @active)";

    var id = Fetch(c => c.Query<int>(sql, new
    {
        displayName = user.DisplayName,
        email = user.Email,
        createdDate = date,
        active = user.Active
    }).Single());

    user.Id = id;
    user.CreatedDate = date;
}
Note that we don't set any values on the object until the Fetch has succeeded - we don't want the object state to be updated if there is an exception. Even though the code (appears) to work that way, we'll create another test to ensure that.
[TestMethod]
public void When_an_exception_is_thrown_the_id_and_date_are_not_updated()
{
    var repository = _c.GetRepository();
    var user = _c.CreateStandardUser();
    var otherUser = _c.CreateStandardUser(1);
    otherUser.DisplayName = user.DisplayName;

    repository.Create(user);

    var expectedId = otherUser.Id = -1;
    var expectedDate = otherUser.CreatedDate = DateTime.MinValue;

    try
    {
        repository.Create(otherUser);
        Assert.Fail("Expected SqlException not thrown.");
    }
    catch (SqlException)
    {
        Assert.AreEqual(expectedId, otherUser.Id);
        Assert.AreEqual(expectedDate, otherUser.CreatedDate);
    }
}
And that test passes, as we hoped. We continue writing tests in this manner, testing small bits of functionality until we believe that our methods perform the way we think they should. While it won't be possible without developing a parallel implementation for our functions to develop the tests without interdependencies, this gives us a reasonably simple way to build up our code in a Test Driven fashion.

Hopefully this small example, with the code included in the GitHub repository, has shown you how to get started doing TDD with Dapper. I've left the code in the repository incomplete so you can experiment with it. In particular, there are some interesting behaviors in the Post repository. You might want to experiment with getting the posts associated with a specific user, which returns a filtered collection, for example.

Tuesday, February 11, 2014

Solving the Challenges of Dapper Integration Testing: Part 3



A Dapper GentlemanThis is the fourth in a series of posts on lightweight integration testing with Dapper. These lightweight tests will enable us to do Test-Driven Development when creating our Dapper-based repositories. In my previous posts I’ve discussed the goals for our tests and covered three of the four challenges that were introduced. The first post in the series introduced our goals: tests that are automated, fast, simple, independent, self-contained, safe, and require minimal extra tooling to implement. The second and third have covered the challenges so far: providing a data connection, isolating the execution of the tests, and cleaning up our test data between tests.

In this post we’ll cover the last remaining challenge: keeping the test database schema in sync with our production database. This code and the code for all previous posts is available at https://github.com/tvanfosson/dapper-integration-testing

Synchronizing the Schema


To keep our database synchronized, and in fact, to provide a way to maintain our database schema we’ll borrow a concept used in Ruby/Rails and Entity Framework, namely database migrations. While we could do this entirely in code, I’m going to leverage the SQL Server Management Studio (SSMS) tooling and maintain the migrations as a collection of SQL scripts and automate their execution in code. This provides a good compromise, in my opinion, between the complexity of creating an entirely code-based migrations infrastructure and the burden of keeping the database schemas in sync manually.

Certainly, though, if there were one area of this solution that I could improve it would be this one. One possible alternative would be to use Entity Framework migrations with or without using Entity Framework as the ORM (object-relational mapper). Such a solution would, perhaps, be ideal if you were augmenting your Entity Framework-based application with targeted use of Dapper for specific queries requiring the speed advantage that Dapper provides. Because this series is Dapper focused, however, I’m going to use the solution described above: SQL scripts, executed automatically in our test framework. The solution I described is specific to Microsoft SQL Server and SSMS-generated scripts, but can be adapted as needed for other databases and tooling. If you’re not familiar with the Generate Scripts wizard in SSMS, you can find more information at http://technet.microsoft.com/en-us/library/bb895179.aspx.

Approach

I will be using a database-first approach. By that I mean that I will use SSMS to create the database tables, indexes, relations, etc. and then reverse-engineer the scripts that will be used to create and upgrade the production database.  These scripts may be edited by hand to ensure that existing data is preserved and migrated as needed as the schema changes. There are three types of scripts that you might want to produce. Two of these will be used in maintaining the database itself, the “up” and “down” scripts. These are used to create/upgrade and back out upgrades, respectively. We’ll make use of the “up” scripts, which I typically name with “Create” or “Add” (as in “add this functionality”) in the script name. The third type of script, the “delete” script will be used solely for our testing purposes. This could be automated in the same manner as our database clean up, but since these scripts are easy enough to generate with the wizard we’ll leave that for a future exercise.

Create the Database

For my example, I’m going to use the tried and true (and trite) blog post. Note: the model is not intended to be realistic, but rather illustrative of the concepts that we’ll use for testing.

We’ll create tables representing Users, Posts, and PostDetails. The table names follow the plural convention and will typically correspond with our models directly. All of our tables will use artificial, auto-increment integer identity columns, named Id, as their primary keys. A User will have a DisplayName (nvarchar), Email (login id) (nvarchar), a CreatedDate (datetime), and a bit column, Active, indicating whether the user is currently an active user or not.

A Post has an associated OwnerId, a foreign key to the Id in the Users table, a Title (nvarchar), Slug (nvarchar), a PostedDate (datetime), EditedDate (datetime), and a bit column indicating whether the post is Deleted or not. Well, set the foreign key (FK) relationship up to cascade on delete. It has an additional non-unique index on the OwnerId column. I anticipate that our routes would incorporate the Post Id and so an index on the Slug column isn’t necessary.

Generate scripts dialog
Generate Create and Delete Scripts
Each Post has a related PostDetail through the PostId, foreign key on PostDetails to Post. The PostDetail will consist of a SequenceNumber (int) and a Text column containing the actual contents of one part of a Post. SequenceNumber will be maintained by our application and will represent separate “pages” in a post. For simplicity we’ll set the Text column up as nvarchar(MAX). It might contain HTML or MarkDown. The FK relationship will be set up to cascade on delete. We will add a unique index on PostId and SequenceNumber, enabling us to use index joins to get the related details for a post quickly.

This will be our simple database. Using Tasks→Generate Scripts we’ll create 000-Create-DapperTesting.sql and 000-Drop-DapperTesting.sql. These will be stored in a DatabaseScripts directory in our source code repository. To ensure success make sure you configure the options to check for existence when generating these scripts under the Advanced options setting in the wizard. You will also want to make sure that all of the necessary objects are included.

Automatically Provision the Test Database

Add the test database to the solution

Add New Items - Data - Service-based Database Dialog
Add a Service-based Database to the Unit Test Project
As I noted in previous posts we’ll be using a LocalDB database for our test database. Once we’ve created our unit test project, we’ll add a Service-based Database to our project. Right-click on the test project, and choose Add Item→New. Select Data, then choose Service-based Database. I gave mine the name DapperTestingDB.mdf. Once you've completed this step, you will want to make sure that the log file is also added to the solution using Add→Existing item, then selecting the LDF file associated with the database. Use the properties dialog to set both the MDF and LDF items as Content items. To be safe, I also set mine up to Copy Always. It's important that the MDF and LDF files be deployed at the same time since an out-of-date log file will prevent your LocalDB instance from being used. By copying both new for each build, this avoids that problem, though you could probably get by with Copy If Newer. I changed mine to Copy Always in the process of debugging the log file synchronization issue and simply haven't changed it back on the theory that you don't change things that aren't broken.

Update the TestContextBase

Now that we have our (empty) test database in the solution, we need a way to initialize it. In fact, we want to reinitialize it each run to capture any changes that may have been made. To do this we’ll update our TestContextBase class that I introduced in my previous posts.

Because we only want the database to be initialized once per test run, not once per test, we’ll create a static constructor that invokes the database initialization code. A static constructor is run once, before “the first instance is created or any static members are referenced.”  Because our static constructor is on the base class, this means that it will be executed once for any and all test contexts that derive from it. This constructor is relatively simple. In it we find the “down” and “up” (or “delete” and “create” as they are called internally) scripts, then first execute the delete scripts, then the create scripts.

In this way, we make sure that each test run starts off with a fresh database that is synchronized to the latest schema represented in the scripts. By making sure that we check for object existence before we delete (or create) it, we can be sure that changes to the scripts don’t cause errors due to a new database object being introduced – that object won’t exist yet and so the existence check will fail and the associated delete will not be attempted.
internal abstract class TestContextBase : IDisposable
{
    private const string ScriptsDirectory = @"..\..\..\..\DatabaseScripts\";
    private readonly static string[] _createScripts = { "000-Create-DapperTesting.sql" };
    private static readonly string[] _deleteScripts = { "000-Drop-DapperTesting.sql" };
    private readonly static string _dbFile = Path.Combine(Environment.CurrentDirectory, "DapperTestingDB.mdf");
    private static readonly string _connectionString = @"Data Source=(LocalDB)\v11.0;Initial Catalog=DapperTesting;Integrated Security=True;AttachDbFilename=" + _dbFile;

    static TestContextBase()
    {
        var deleteScripts = _deleteScripts.Select(s => Path.Combine(ScriptsDirectory, s)).ToList();
        var createScripts = _createScripts.Select(s => Path.Combine(ScriptsDirectory, s)).ToList();

        Assert.IsTrue(deleteScripts.All(File.Exists));
        Assert.IsTrue(createScripts.All(File.Exists));

        ExecuteScripts(deleteScripts);
        ExecuteScripts(createScripts);
    }
    
    // remainder of class omitted
}
The implementation of ExecuteScripts for MS-SQL is interesting in that the scripts generated by SSMS cannot be run directly using SqlCommand due to the inclusion of comments, USE, and the SSMS-special GOcommand. To overcome this we need to first filter out any commands that are unnecessary to maintaining the database and/or will cause issues if invoked as a SqlCommand. After some experimentation I settled on filtering lines that started with "USE", "SET", or starts with "/*" and ends with "*/". This is specific to the scripts created by SSMS. Further the scripts are broken into batches of commands that can be executed together via a single SqlCommand by splitting on the GO keyword.
private static readonly Func<string, bool> _startsBatch = s => s.StartsWith("GO", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isUse = s => s.StartsWith("USE", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isSet = s => s.StartsWith("SET", StringComparison.OrdinalIgnoreCase);
private static readonly Func<string, bool> _isComment = s => s.StartsWith("/*") && s.EndsWith("*/");

private static void ExecuteScripts(IEnumerable<string> scripts)
{
    Func<string, bool> isValidCommand = c => !_isUse(c) && !_isSet(c) && !_isComment(c);
    using (var conn = GetConnection())
    {
        conn.Open();
        foreach (var script in scripts)
        {
            var commands = File.ReadAllLines(script).Where(isValidCommand).ToList();
            IEnumerable<IEnumerable<string>> batches = CreateBatches(commands).ToList();
            foreach (var batch in batches)
            {
                using (var query = conn.CreateCommand())
                {
                    query.CommandText = string.Join(Environment.NewLine, batch);
                    query.ExecuteNonQuery();
                }
            }
        }
    }
}

/// <summary>
/// In order to execute groups of commands that can be executed together (not all can),
/// we separate them by the GO command into distinct, executable sets.  This assumes
/// that the scripts have been generated by Tasks->Generate Scripts in SSMS but will
/// also work if each script only contains commands that can be grouped together.
/// </summary>
/// <param name="commands"></param>
/// <returns></returns>
private static IEnumerable<IEnumerable<string>> CreateBatches(IEnumerable<string> commands)
{
    var batch = new List<string>();
    foreach (var command in commands)
    {
        if (_startsBatch(command))
        {
            if (batch.Any())
            {
                yield return batch;
                batch = new List<string>();
            }
        }
        else if (!string.IsNullOrWhiteSpace(command))
        {
            batch.Add(command);
        }
    }

    if (batch.Any())
    {
        yield return batch;
    }
}
Now our TestContextBase is set up to (re)create our database afresh for each test run. This will add a little bit of overhead to our tests runs, but we minimize this by doing it once and it is more than offset by our ability to do TDD on our Dapper-based repositories.
There are some obvious or apparently obvious opportunities for improvement in the above. First we could maintain many of the settings in a configuration file rather than in code. I chose not to do this because this is test code. It’s really only changing when I’m actively developing the code (and tests) so maintaining changeable settings in a configuration actually doesn’t save us much since we’re modifying the code already.

Another, probably better, opportunity would be to invest in naming conventions – we already have a start on this with the names we’ve used – and standardized locations for our scripts. This would enable use to enumerate them in code at runtime rather than hard-code them in our test context base class. This is something that I would probably do but in a simple case like this I don’t find it necessary. Clearly, we would want to refactor this as the database schema grows.

This wraps up the set up portion of this series. In the next post, I’ll start doing the development of the repositories using TDD based on what we’ve developed so far.

Monday, February 10, 2014

Solving the Challenges of Dapper Integration Testing: Part 2


This is the third in a series of posts on lightweight integration testing with Dapper. These tests are intended to enable us to do Test-Driven Development in creating our Dapper-based repositories. As a reminder our goals – see the first post for the complete explanation – are tests that are:
    A Dapper Gentleman
  • Automated
  • Fast
  • Simple
  • Minimal Tooling
  • Independent
  • Self-contained
  • Safe

In my last post I covered the two easier challenges: providing a database connection in a way that lets us easily substitute our testing database and isolating our tests. Because these are integration tests, the latter is important as the database is a common interaction point for our tests and running them in isolation keeps the test data from one test from causing failures in other tests.

In this post I’m going to cover one of the two remaining challenges: cleaning up test data after each test. We'll leave the last challenge for the next post.

Data Clean Up


One of the difficulties with integration tests, especially the isolated kind that I am interested in, is the effort of making sure that your data is set up exactly as you want pre-test. Generally, this will involve writing some code that cleans up any data from previous tests or test runs. This is complicated by relationships in the data, especially in the case where deletes do not cascade. In those cases, a delete can fail when the data in the “child” relation hasn’t been deleted first.

One way to handle this is writing database-specific delete code that understands the relationships and performs deletes in the correct sequence to prevent failures due to non-cascading deletes. To me, this is painful – and certainly not simple.

Another way to handle it would be to simply delete the database tables and related objects between each test run. As we’ll see later, this will be necessary when maintaining currency with the database, but it seems overkill – and certainly slower than we would like, deleting and re-creating the schema per test.

It would be better if we could to leverage the database structure itself to automatically create delete statements in the proper order. Rather than writing the code once-per database, we could write the code once per database provider and it would suffice for most, if not all, databases that use that provider. At least it would suffice for the kinds of databases that I’m concerned with most of the time.

This idea is not original with me and I didn’t want to reinvent the wheel if I could possibly avoid it so I did a bit of research into automated integration testing. During my research, I came across a DatabaseDeleter class by Jimmy Bogard, who blogs at LosTechies.com, that I thought was the perfect solution to the problem of automatically cleaning up data generated in an integration test. The only problem – it was written for use with nHibernate. I adapted it to work using SqlConnection and SqlCommand for Microsoft SQL Server (the version I’m using is Microsoft SQL Server 2012).

The most interesting method is the BuildTableList method (full code is available on GitHub):BuildTableList
private static IEnumerable<string> BuildTableList(ICollection<string> allTables,
                                                  ICollection<Relationship> allRelationships)
{
    var tablesToDelete = new List<string>();

    while (allTables.Any())
    {
        var leafTables = allTables.Except(allRelationships
                                  .Select(rel => rel.PrimaryKeyTable))
                                  .ToArray();

        tablesToDelete.AddRange(leafTables);

        foreach (var leafTable in leafTables)
        {
            allTables.Remove(leafTable);

            var relToRemove = allRelationships.Where(rel => rel.ForeignKeyTable == leafTable)
                                              .ToArray();

            foreach (var rel in relToRemove)
            {
                allRelationships.Remove(rel);
            }
        }
    }

    return tablesToDelete;
}

This method takes a collection of tables (all the tables in the database) and a collection of parent-child relationships between tabels. If you think of the relationships as forming a set of trees, it performs a bottom-up traversal across these trees, adding the leaf nodes discovered in each pass to an ordered list of tables to delete data from. Simultaneously, it removes those leaf nodes from the trees, by removing the relationships and removing the tables from the set of tables to consider, so that in the next pass the parents of those leaf nodes become the new leaf nodes. In this way, if we delete data from the ordered table collection in order, we are guaranteed to not violate any foreign key constraints. This gives us a safe order in which to build the delete commands to avoid failures.

This collection is used by BuildDeleteTables to generate the delete command associated with a particular database. This is constructed once per test run and subsequently reused per test clean up to remove any data that the test may have added.

BuildDeleteTables/BuildDeleteSql
private void BuildDeleteTables()
{
    if (_initialized.Contains(_database))
    {
        return;
    }

    lock (_lockObj)
    {
        if (!_initialized.Add(_database))
        {
            return;
        }

        using (var connection = new SqlConnection(_database))
        {
            connection.Open();

            var allTables = GetAllTables(connection);

            var allRelationships = GetRelationships(connection);

            _tablesToDelete = BuildTableList(allTables, allRelationships);

            _deleteSql[_database] = BuildTableSql(_tablesToDelete);
        }
    }
}

private static string BuildTableSql(IEnumerable<string> tablesToDelete)
{
    return string.Join(" ", tablesToDelete.Select(t => string.Format("delete from [{0}];", t)));
}
The DatabaseDeleter class is used in the Dispose method (following Chris Patterson’s IDisposable Done Right pattern) in our abstract TestContextBase class.

TestContextBase (Dispose implementation)
internal abstract class TestContextBase : IDisposable
{
    // Remainder of class omitted for brevity
    
    // IDisposable implementation. On dispose of the context, we clean up any data inserted by the test
    
    protected bool _disposed;

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    ~TestContextBase()
    {
        Dispose(false);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (_disposed)
        {
            return;
        }

        if (disposing)
        {
            var deleter = new DatabaseDeleter(_connectionString);
            deleter.DeleteAllData();
        }

        _disposed = true;
    }
}
The Dispose method is invoked in our test classes in the clean up method, which we’ve seen before, but is now shown with the context-related code for a sample DapperUserRepository below. After each test, the test context is disposed, and the DeleteAllData for the related database is run, cleaning up any data that was inserted by the test.
[TestInitialize]
public void Init()
{
    Start();
    _c = new DapperUserRepositoryTestContext();
}

[TestCleanup]
public void Cleanup()
{
    if (_c != null)
    {
        _c.Dispose();
    }
    End();
}
That wraps up our data clean up task. Using our base test context and it's IDisposable implementation lets us make sure that each test starts with a clean database, as long as our clean up method runs (see the discussion in first post in the series). Next time we'll see how to handle creating the database schema so that our test database is automatically set up for each test run as our database changes.