Unhandled Exceptions

04 Oct

The Four Stages of Object-Relational Mapping (a progression)

In my career I have seen many a developer embark on the journey that eventually leads them to what I would term ‘Object-Relational Mapping as it was originally intended‘.  After watching this process over and over again, I have reached the conclusion that the path is always the same.  Different people on the journey may be in different stages and some may even stay ’stuck’ in one stage for several years (or even, sadly, forever in some cases) but the stages along the journey always seem to happen in roughly the same progression.

I know this because its the same journey I have made myself and I see it repeating in other developers regardless of their backgrounds.  Like any journey, its helps to know where you’re headed, so I wanted to share my observations on these different stages.

Stage I: the Dataset (the ‘O/R- what?’ stage)

It (usually) begins with the .NET Dataset.  Seduced by the (relative) ease of draggy-droppy™ designer support that allows a painless creation of data-access code, developers are taken in by a technology that offers a RAD experience that couples the ease of a visual design surface with the familiarity of settable properties to control behavior and a wizard that lets you write comfortable SQL statements in small edit boxes at design-time and even preview the results of your queries before you commit them to code (interestingly, by clicking ‘OK’ rather than actually coding any queries, of course smile_wink).

Statements like “Hey, I just built my whole data-access-layer in 15 minutes!” are a common occurrence during this stage.  You rejoice in the (relative) ease with which you can do really ‘cool’ stuff like databind to form controls just by setting a few properties.  You benefit (at least during the initial development stages) from the framework for your application being part of a single, (reasonably) coherent, self-reinforcing ecosystem of interconnected parts designed with each other (mostly) in mind.

This is the way data access was always supposed to be!“, you cheer.

Stage II: Generated Code (the ‘A Class for every table and every table with its class’ stage)

But then one day, usually when your underlying database changes a lot or when someone suggests a significant new set of functionality be added to your application, the Dateset hangover starts to set in.  You begin to get concerned that all that Dataset code that used to ‘just compile perfectly’ has started to raise errors and return arcane error messages only a mother could love from parts of the application that you didn’t even know had code in them in the first place.  “My app won’t compile because of some error in my ‘designer.cs’ file.  I didn’t even know I *had* a ‘designer.cs’ file in my project!” is a familiar refrain often heard at the beginning of this stage.  You’re in .NET Dataset Maintenance Hell, a place you have to spend some time in yourself to truly appreciate.

So you start to look around for a data-access approach that will shield you and your code from what seems to you should be reasonably minor changes in the database.  You look for a solution that will save you when the Customer table gets changed to Customers without you’re needing to lift a finger.

Your quest for a better approach leads you begin to explore the world of Object-Relational Mapping frameworks that so many people seem to be talking about.  But even if you limit your research to the most-popular O/RM frameworks, there are at least 5-10 of them offering a huge and varying array of features, they all seem really complicated, and most of them are open source so the documentation tends to be lacking compared to the professional (though often also unapproachable) documentation that surrounded the .NET Dataset world with which you are already familiar.  But you still need to solve your .NET Dataset pain so you keep at it.

You discover the crack-like high that comes from code-generation tools that can slave your class-design to the database schema, regenerating your code every time your database changes.  You notice that the code generated from this approach isn’t nearly as obtuse and opaque as the .NET Dataset code (and it doesn’t come with a “this code generated by a tool, don’t edit it by hand” annotation intended to frighten you away from looking at it in the first place.  You even learn to extend the generated code by hand using tricks like partial classes, although even as you’re doing this you can hardly even believe they actually added a whole keyword to the language just to support the needs of the visual tooling smile_embaressed.

Some of these code-generation approaches are completely integrated into the O/RM frameworks, others are third-party or OSS engines with templates designed to support your selected O/RM, but they all work in mostly the same way: classes named the same as your tables, and properties named the same as your fields are created with the press of a single button by interrogating and reverse-engineering your database schema.

Hey, I just built my whole data-access-layer in 1 second!” is commonly heard during this stage, which is much faster than the 15 minutes the last way required.  And with the code-gen in place, you find you hardly miss the visual tooling that made the Dataset so attractive.

This is the way data access was always supposed to be!“, you cheer.

Stage III: Mapping DTOs to Rich Domain Models (the O/O Mapping stage)

But then one day, usually as your projects start to get more complex, you start to realize that you need to have classes in your object model that don’t relate directly to the tables in your database.  You start to build a richer object model designed around the needs of your application rather than the needs of your database.  Slowly, a more fully-featured and more intricately-designed and finer grained object model emerges from your software designs that includes classes that make sense to solve your software problems but don’t match well with your database design.  Sometimes data from two and three different tables needs to find its way into a single object in your model, and you’ve just discovered what was really meant by the “Object-Relational Impedance” mismatch all those people kept talking about all the time in those discussion groups you read when researching O/RMs in the first place.

But you’re addicted to the crack-like high of code-generation, so you try to have the best of both worlds: you code-gen a set of classes against your database schema, call them ‘Data Transfer Objects’ or ‘DTOs’, and then start the arduous process of aggregating and composing multiple DTOs into the Domain objects that need them.  You begin this process slowly by hand, thinking “how hard could this be?” and before you know it you start to write reams of brittle object-property-assignment code like…

DomainCustomer.Address = AddressDTO.Address;
DomainCustomer.CustomerType = TypeDTO.Type;

 

Sure, you’re spending a lot of time now mapping objects-to-objects (and via tedious, hand-written code that the code-generator was supposed to save you from in the first place no-less), but you are still able to leverage your code-generation approach and so it feels like the best of both worlds to you: auto-generated classes that map to the database structure and yet also a whole separate collection of classes that you can point to and say “This is my domain model, driven by the needs of the user-base and the problem-space.  They consume the DTOs when needed to persist their values to the database.

This is the way data access was always supposed to be!“, you cheer.

Stage IV: Mapping Rich Domain Models to the database (the ‘eureka moment’ stage)

One day you notice that there is a lot of repetition to the object-property-assignment code you keep writing over and over again to hydrate data into your Domain Model objects.  You also notice that every time your database changes even slightly and you code-generate your DTO classes in response that the brittle property-assignment-code you wrote keeps breaking.  And there’s so much of it!  And its scattered all over the place in your class definitions!  “What a mess!“, you discover.

So like all good developers faced with an annoying infrastructure challenge, you start to dream of a framework you could build that would make all this repetitive code just ‘go away’.  “What I really need now,” you reason, “is an Object-Object-Mapper, a sort of ‘O/OM’ that will let me configure the relations between my DTOs and my Domain Model objects without having to write and maintain all the messy assignment code.“  But like all framework and library developers, when you start to really think about the complexity of what you’re considering building, the challenges start to mount.  Collections mapping, read-only properties on Domain Objects, identity values, and a whole host of other challenges adorn your list of impediments for your new framework.

So you look around at the landscape and slowly being to realize that the class of software you actually need has already been built but you haven’t been using it as it was intended at all.  What you really need to solve this problem is an Object-Relational Mapper, the very tool you have already been using for years! smile_embaressed

But instead of hampering the power of your O/RM framework by needlessly shackling it to a code-generation tool, you realize that you need to start to throw off the artificial security-blanket of generated code slaved to your database schema and instead use the O/RM tool to map your database directly to your rich Domain Object Model classes.

Free from the encumbrance of code-generated DTO classes, your single object model is free to express the needs of your software in a much more robust and flexible manner just as it did in the past stage.  But now you are also free of the need to do O/O Mapping between your Domain Objects and your DTO classes.  The O/RM tool takes care of persisting data from your objects to the database and hydrating these same objects with data from the database when needed.  The tool is doing for you what it was originally designed to accomplish: freeing you from maintaining brittle plumbing code for data access anywhere in your own applications.

This is the way data access was always supposed to be!“, you cheer.

And you would be right.  Welcome to the end of your journey.

6 Responses to “The Four Stages of Object-Relational Mapping (a progression)”

  1. 1
    nightshade427 Says:

    Yes that pretty much sums up how I progressed…good observations ;).

  2. 2
    Chris Says:

    Steve, what is your opinion of the MyGeneration tool and template you used in your videos? I’m in the process of designing a brownfield data access layer via nhibernate and would like to skip the stage 2 and stage 3 issues you mention here. What’s the best way to get to stage 4 bliss if the database structure is already fairly cemented? There are approximately 100 tables or so in this database.

  3. 3
    sbohlen Says:

    @Chris:

    That’s a really great question and I think that it points to something that I didn’t really touch on in my post but that I firmly believe: that each of these stages may be entirely appropriate in diffferent contexts (yes, even the .NET dataset stage has its place in certain solutions) :)

    Realistically, all of the examples that I used to illustrate each of these stages sort of assume a greenfield situation rather than a brownfield one. IMHO if your database is already built and you are also in a brownfield application situation (e.g., there is an already-built application that depends upon this existing database) then the ‘code-gen DTO’ approach seems perfectly reasonable to me as a way to jump-start the introduction of an O/RM tool.

    In such a context, you already have an existing application (presumably with existing logic, objects, structure, etc.) that is already well-fixed in terms of its modeling of your problem domain (and hopefully also SOLVES the problems its designed to address); the ‘bliss’ (nice term, BTW) provided by Stage 4 is not really going to be achieved in this kind of a situation since you are likely not re-engineering a new rich domain model into your application at the same time that you’re re-engineering your data-access-layer.

    If its just your database that’s brownfield and not your application, then the pre-existence of a database doesn’t in any way preclude your approaching the problem from the ‘Stage 4′ perspective. In fact, this is one of the areas that many O/RM tools (NHib included, OF COURSE :) ) can truly shine — mapping an existing database schema to a new object model where the database schema was designed with one set of concerns in mind and the object model is designed with a different set of concerns.

    By listing the Stages as a progression, I think I may have left the impression that only ‘enlightened’ developers achieve Stage 4 (like its some kind of Bhuddist ladder or something) and that the higher stages are somehow ‘better’ in every context. Like I said, ALL of theses approaches are valid in different contexts, and so considering your situation before deciding how to approach a solution is of course critical here.

    As for the MyGeneration template from the screencast, we have used it in any number of projects (where the code-gen approach made sense) with great success, so if this approach makes sense for your situation, I certainly wouldn’t take this post of mine as an effort to suggest that your approach is invalid.

  4. 4
    Eric Says:

    So I guess I am between a hybrid state 2/3 and heading to stage 4. I have this email in my inbox with a discount for upgrading our CodeSmith installs to V5 and I don’t think that we need it anymore. NHibernate is on our horizon.

  5. 5
    Wa Says:

    Excellent post!

    I have been working with NHib for about two weeks now and think I have managed complete most of the journey and am now looking down at the green pastures of Stage IV. What I fear is that when I reach said pastures, while the grass may be greener, I will be greeted by the not so pleasant potpourri of the need for significant ORM specific subject matter expertise.

    Moving away from my crappy metaphor (pun intended), what I am afraid of is that the level of expertise required to generate a rich and meaningful domain model (particularly in a brown field situation) will require such expertise with Nhib or other ORM that your tool becomes a new career path instead of a means to an end.

    I have already stumbled across a few “thou shalt not” scenarios in my short experience with NHib (mostly due to ignorance and bad design I’m sure), that seem to indicate that the further you get from the bread and butter One Class One Table world the more likely you will be to confronted by the potential steep learning curve of ORM specific expertise and and maintainability / extensibility issues when your ORM guru leaves.

    Am I wrong in assuming that the greater the disparity between your data model and the domain model the greater the required ORM expertise? Have you had issues making your chosen ORM bend to the needs of your desired domain model or am i worried about nada…

  6. 6
    sbohlen Says:

    @Wa:

    I actually think that your observations are indeed on-target. As your Domain Model digresses more and more from your relational model, the correspoinding disconnect increases and thus the effort needed to map the two back to each other (usually via an O/RM) can get quite significant.

    If your database already exists in your application and you have good reason to radically diverge your Domain Model from the database, then a tool like NHibernate may not be the right one. IBatis (and IBatis.NET) is a tremndously powerful alternate approach that offers a much more flexible relational < --> object mapping experience than even Hibernate/NHibernate provides. If your Domain Model really bears *ZERO* resemblance to your relational model, then this might make more sense for your situation.

    Its a spectrum though along which you need to decide on a target ROI and a solid cost-benefit target. Like all good software design questions, the answer is ‘it depends’ and its up to you to decide what’s right for your project, of course. But in general, yes, you can design yourself into a real corner if you’re not careful with developing a Domain Model in complete isolation from any idea about how it will be persisted.

    Remember what I always tell people: just because your *objects* are persistent-ignorant doesn’t mean that *you* our your *design* should be too :)

Leave a Reply

© 2008 Unhandled Exceptions | Entries (RSS) and Comments (RSS)

GPS Reviews and news from GPS Gazettewordpress logo