Thursday, July 19, 2012

The Emperor's New Data Model

That was the title of the post to a LinkedIn group ("Healthcare Data Warehousing Association") by my friend and colleague, Dr. Jonathan Einbinder, of Partners Healthcare.  Here's a copy and paste of his post and my response.

Dr. Einbinder:  "In real-world terms, how is a comprehensive healthcare data model (e.g. like the ones provided by Oracle or IBM), helpful in moving forward an analytics agenda? Seems to me that potential value lies in providing a standard target for ETL and in building (or acquiring) content/reports on top of the model. That is the idea, anyway. In pragmatic terms, I wonder if it makes more sense to focus on standardizing a few key dimensions... (anyone remember IHC's Common Data Bus). I would like to hear how other provider organizations approach this."

Dale Sanders: The comprehensive enterprise data model sounds great, in theory, but it fails miserably in practice. I've lived it first-hand and watched it second-hand numerous times now. Of course you know my bias for the Intermountain (IHC) bus architecture, but that bias comes from a good place...pragmatic common sense. 

The origins of the bus approach go back much further than Intermountain (IHC), actually, back to an Air Force project that I worked on in the early 1990s. The bus architecture emerged after the failure of a $60M data warehouse project to pull 450 source systems into a common analytic model for nuclear weapon systems management. It took three years and 200 consultants to develop the enterprise model and ETL for that project. The ETL was so complicated, we spent all of our time maintaining and fixing the problems. When a new source system was introduced, the ETL scripts took months to write; there was no agility. When we exposed the enterprise data model to the analysts, they stared blankly because they had lost all familiarity with the content of their source systems, even though we had a very robust metadata repository, showing all the mappings-- the analysts simply didn't recognize the "enterprise" representation of their data. Finally, the number of analytic use cases that required an enterprise model was less than 2%. We tracked, measured and surveyed the need, post facto.

An electrical-computer engineer, Scott Birbeck, who was a friend and colleague, but not assigned to this particular project, was watching this disaster from a distance. One day he said to me, "Why don't you guys stop trying to make all the data look the same and just make it look the same only where you have to, like we (EE's) do when we build a bus for a computer? We could care less about the details of the devices on the bus, we just care about how they connect to the bus so they can communicate to everything else. Isn't this basically the same problem?" We went to the white board and started sketching.

The next day, a few of us who were desperately trying to pull this data warehouse disaster from the fire, sat down and defined the format and naming conventions for the "core data elements" required to communicate across the "data bus". It took less than a full day to do that for 450+ source systems. The next day, two of us started adding attributes to the source system data models in the staging area, to reflect these core data elements (previously, we had not allowed analysts to access their data in the staging area, which was another mistake in retrospect). In about a week, we had the ability to query across a large number of the most important source systems using these new foreign keys in the data bus.

Going against the policies of the project manager, we secretly gave analysts direct access to the source system data in the staging area and told them how to take advantage of the new data bus...and away they went. They loved it. Soon, the analysts started asking for extracts of the data from the source systems-- data mart subsets of the data that pertained to their particular analytic need; so we built those, too-- they didn't need nor want to constantly pull data from an enterprise model.

We purposely kept this all very secret until there was enough momentum and support from the analysts behind the new approach that the success could not be denied. We didn't want to take this secretive route, but had no choice because the project manager was dogmatically, egocentrically tied to the enterprise model strategy. He soon left the company and still despises those of us who were involved in the recovery, but the Air Force Colonel who was the executive sponsor loved us, as did the analysts.

In three months, the project went from disaster to heroic. The design is still operating and evolving nicely, now almost 20 years later. Oracle told us that it was their largest data warehouse in the world at the time.


Steve Catmull said...

Great post. Can you think of an environment where this data-bus pattern would not work well?

Glen McCallum said...


Have you followed any of the ONC work with the Query Health initiative? (see:

How would you compare this to the bus architecture with a set of common data elements?

Nuclear and Healthcare Decision Making

Nuclear warfare operations was my data-driven decision making environment before the healthcare phase of my career. It was all about recogni...