Saturday, December 27, 2014

Converting a Legacy LINQ-to-SQL Data Context to Entity Framework

Recently I worked with a client to upgrade their web application to use the latest technology available for ASP.NET MVC. This included upgrading the .NET framework, upgrading existing NuGet packages (and using packages to replace libraries tracked in SCM), and replacing custom code that duplicated current framework features. The largest change, however, was to upgrade the data access layer to use a combination of Entity Framework and Dapper instead of LINQ-to-SQL (L2S). Because this is client code I'm only including a few isolated snippets and not a link to the repository in this article.

While there are many similarities between L2S's DataContext and Entity Framework's DbContext implementations, there are enough gotchas that it became more of an ordeal than I had anticipated.

Reverse Engineering the Model

I started out by using the Entity Framework Power Tools to reverse engineer the existing database. I did this into a separate solution to avoid conflicts with the existing model classes and data context. Once you've added the power tools you will get an Entity Framework option with some new sub-options when you right-click on a project. I used the Entity Framework -> Reverse Engineer Code First power tool to generate the model classes, fluent mappings, and DbContext implementation.

Once I had the model reverse engineered, I manually adapted the reverse engineered classes to reflect the changes made to the L2S model over time.  This included adding some short cut properties – say to find the first item in a collection – and convenience methods.  I’ll note here that my goal was to do an “in-place” upgrade, not refactor the architecture. Even though there were places that refactoring would improve the code I felt that it was too risky to attempt much refactoring AND swap the underlying data access technology at the same time. There were only  a few places where there was some obvious code duplication that I ended up changing in this way; mostly I simply translated the code from the L2S classes to the EF classes. Because I was doing this in a separate solution this involved copying some code over from the existing solution where the existing models referenced types outside of the model.  Mostly these were enumerated types (enums) that were used for some model properties. By and large the model classes were well-isolated from the rest of the solution.  The fact that the previous developers followed good principles made the transition much easier.

Even so, there were a few areas of “gotchas” where the design choices in EF didn’t match those in L2S that caused me problems.

Configuring Relations

This was probably the largest area of difficulty that I faced. There were 3 distinct types of mapping issues that I had with relations. First, not all of the relations in the previous model had been mapped. In those cases I simply removed the mapped relation from the reverse-engineered models. Since my goal was to replicate existing behavior this seemed like the obvious choice. Later, when we start doing optimizations on the EF code, some of these may be added back if needed to improve query performance.

Second, some relations were mapped as one-to-many using the EF tooling that had been mapped one-to-one in L2S. These were trickier to deal with because of some missing capabilities in the fluent mapping methods.  With a one-to-many mapping, the fluent EF tooling allows you to include both the navigation property and the column housing the foreign key in the model. That’s not true – or at least I could not find a way to do it – for one-to-one/zero. In those cases, the methods that allow you to specify the foreign key correlation to an model property do not exist and if you try to include the FK property, you get a column exists conflict on the model.
I handled these in two ways. If the FK property wasn’t used outside the model class, I defaulted to removing the FK property and setting the property up as one-to-one/zero in the model using WithRequired(t => t.NavProperty).WithOptional(t => t.InverseNavProperty), specifying a custom key as necessary with .Map().  If the FK property was used outside the class – which you might do in order to avoid a subquery to get the related data – I left the property mapped one-to-many and added a convenience property to find the FirstOrDefault() in the collection to replace the existing one-to-one/zero mapping. This necessitated some code changes to set up the relations but these seemed straightforward and involved the least amount of disruption to existing code.  This introduced some properties that needed to be ignored explicitly on the model, but more on that later.

Lastly, I ran into some issues with many-to-many mappings. These included many-to-many mappings that did not follow the Entity Framework convention for foreign keys. Entity Framework expects foreign keys on join tables and join tables that included extra data. These seemed particularly difficult to deal with, though I came to find eventually that what I thought were join table mapping issues were really caused by some navigation properties that had not been ignored.   L2S uses a paradigm where the mappings have to be manually added to the model. EF uses a convention-based mechanism where it applies a default mapping unless you’ve explicitly overridden or ignored it.  This mismatch caused a lot of confusion at first as the reverse engineering tool had set up all of the mappings. In the process of resolving the issues above I had inadvertently introduced some model errors with the convenience properties.

With respect to those places where the keys did not match the expected conventions, I ended up explicitly mapping the foreign keys using .Map() and specifying the appropriate key values. This also involved some code changes where I had to remove the join table model that had existed in the L2S model since L2S did not handle many-to-many mappings directly but only as bidirectional one-to-many mappings from each side to the join table.  When there was additional data in the join table, I left the join table model in the solution and set it up as the bidirectional one-to-many association.

Sample Many-to-Many Modeled as Two One-to-Many Relations

public class UserClientMap : EntityTypeConfiguration<UserClient>
{
    // model simplified

    HasKey(t => t.Id);

    Property(t => t.Id).HasColumn("Id");
    Property(t => t.IsPrimaryContact).HasColumn("IsPrimaryContact");
    Property(t => t.UserId).HasColumn("UserId");
    Property(t => t.ClientId).HasColumn("ClientId");

    HasRequired(t => t.Client)
        .WithMany(t => t.UserClients)
        .HasForeignKey(t => t.ClientId);

    HasRequired(t => t.User)
        .WithMany(t => t.UserClients)
        .HasForeignKey(t => t.UserId);
}

Excluding Properties


The existing model included some properties that were not persisted. I had made an initial assumption that if these weren't mapped in the fluent mappings, they would be automatically excluded since all of the properties in the database had been mapped in the reverse engineered mapping files. This turned out to be a false assumption; I haven't used the fluent mappings much and haven't internalized their idiosyncrasies. I had also made the assumption that properties without setters, the convenience properties from above, would also be ignored. That also turned out to be false.

I began to set up the properties causing model errors manually using .Ignore(). This would work, but given the size of the model my laziness prevailed and I tried to find a way that I could use conventions that would allow me to use less code per mapping class and help prevent errors caused by failing to update the mapping file when a model changed to add a property that should be ignored.

I landed on an extension method to the EntityTypeConfiguration class coupled with the use of NotMappedAttribute found in the System.ComponentModel.DataAnnotations.Schema namespace.

The extension class has a method, ExludeIgnoredProperties, that enforces two conventions by adding an Ignore() for the property if it contains no setter method or if it is marked with the NotMappedAttribute. Each mapping class adds a call to this.ExcludeIgnoredProperties(); and both of these conventions are enforced for this model. Because each property has a different type I ended up using "dynamic" to allow the correctly typed lambda expression to be returned when constructing the argument for the Ignore() method.
internal static class EntitytTypeConfigurationExtensions
{
    public static void ExcludeIgnoredProperties<T>(this EntityTypeConfiguration<T> configuration)
        where T : class
    {
        foreach (var expression in GetIgnoredProperties<T>())
        {
            configuration.Ignore(expression);
        }
    }

    private static IEnumerable<dynamic> GetIgnoredProperties<T>()
        where T : class
    {
        var type = typeof(T);
        var properties = type.GetProperties(BindingFlags.Instance | BindingFlags.Public);


        foreach (var expression in GetPropertiesWithoutSetters<T>(properties))
        {
            yield return expression;
        }

        foreach (var expression in GetExcludedProperties<T>(properties))
        {
            yield return expression;
        }
    }

    private static IEnumerable<dynamic> GetPropertiesWithoutSetters<T>(IEnumerable<PropertyInfo> properties)
        where T : class
    {
        var type = typeof(T);
        foreach (var propertyInfo in properties)
        {
            var setter = propertyInfo.GetSetMethod();
            if (setter == null)
            {
                var parameter = Expression.Parameter(type, "t");
                var body = Expression.Property(parameter, propertyInfo);
                var lambda = Expression.Lambda(body, parameter);
                yield return lambda;
            }
        }
    }

    private static IEnumerable<dynamic> GetExcludedProperties<T>(IEnumerable<PropertyInfo> properties)
        where T : class
    {
        var type = typeof(T);
        foreach (var propertyInfo in properties)
        {
            var ignoreAttribute = propertyInfo.GetCustomAttribute<NotMappedAttribute>();
            if (ignoreAttribute != null)
            {
                var parameter = Expression.Parameter(type, "t");
                var body = Expression.Property(parameter, propertyInfo);
                var lambda = Expression.Lambda(body, parameter);
                yield return lambda;
            }
        }
    }
}

Replicating Default LoadWith Behavior


The last major challenge I had was to replicate L2S's default LoadWith behavior. In L2S, you need to set up all of the included properties before making any queries using the data context. If you don't an InvalidOperationException is thrown. As a result, the repository implementation defined some properties that automatically got loaded with each model type when the data context was instantiated.

Entity Framework is much more flexible and allows the included relations to be defined per-query, however, I wanted to retain as much of the existing code and behavior as I could. In particular, I needed to retain as much of the efficiency of using included relations as I could without having to re-visit every query performed against the repository

This is probably the area in which my in-place upgrade resulted in the most compromises from what I would normally have implemented. Generally I use a repository per root pattern, but the existing repository was a single generic repository and the "roots" where defined via generic constraints on the model type. Each "root" model type extended an abstract (empty) Root model class and the constraints only allowed a Root class to be the base of a query.

Using a repository-per-root would have allowed me to to easily defined which related entities were loaded by default for each class. With a single, generic repository I would need to use some reflection to achieve the same effect. In addition I wanted to be able to easily override the defaults for specific queries to make them more efficient as needed. Lastly, I wanted to avoid using if/else or switch statements to define the logic. In my mind the included properties should be more closely associated with the model than the repository so I ended up opting for attribute-based definition of the included relations.

LoadWithAttribute

[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)]
public class LoadWithAttribute : Attribute
{
    internal IEnumerable<string> Includes { get; private set; }

    private string _include;

    public string Include
    {
        get { return _include; }
        set
        {
            _include = value;
            Includes = (value ?? "").Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
            .Select(s => s.Trim());
        }
    }

    public LoadWithAttribute(string include)
    {
        Include = include;
    }
}

This attribute is added to each class that needs to have default included properties defined. It takes a comma-separated string of property names for the related data to include. In the repository, the attribute is obtained by reflection but only if the developer doesn't supply a set of included properties specific to that query. Note that the signatures include both one without any included properties and one that takes a number of property names via a params. Both of these defer the behavior to an implementation method that takes an IEnumerable of property names. There's also a signature (not shown) that takes, as params, any number of Func parameters if the developer wants to specify using strongly-typed properties rather than property names.

Sample Repository Methods

public class Repository : IRepository
{
    // class simplified
    private static readonly ConcurrentDictionary<Type, IEnumerable<string>> _defaultIncludeMap = new ConcurrentDictionary<Type, IEnumerable<string>>();

    public IQueryable<T> All<T>(Expression<Func<T, bool>> filter)
        where T : Root, new()
    {
        return AllImplementation(filter, GetDefaultIncludes<T>());
    }

    public IQueryable<T> All<T>(Expression<Func<T, bool>> filter, params string[] includedProperties)
        where T : Root, new()
    {
        return AllImplementation(filter, includedProperties);
    }


    private IQueryable<T> AllImplementation<T>(Expression<Func<T, bool>> filter, IEnumerable<string> includedProperties)
        where T : Root, new()
    {
        IQueryable<T> dbSet = _context.Set<T>();

        foreach (var property in includedProperties)
        {
            dbSet = dbSet.Include(property);
        }

        return dbSet.Where(filter);
    }

    private static IEnumerable<string> GetDefaultIncludes<T>()
    {
        var includes = _defaultIncludeMap.GetOrAdd(typeof(T), type =>
        {
            var attribute = type.GetCustomAttribute<LoadWithAttribute>();

            return attribute == null
                ? new List<string>()
                : attribute.Includes;
        });

        return includes;
    }
}
Now, with these things in place I only needed to exercise the code and find the places where L2S supported constructs were used that don't have translations to EF queries. I'm still working through this exercise, performing query optimizations as I can to take advantage of the greater flexibility that EF offers. Fortunately these are relatively easy to spot (they throw exceptions) and, so far, have been relatively easy to find alternative constructs for.