Fundamentalism in any form is always of concern to me. It can be religious, political or even as trivial as Data Warehouse Architecture. So I thought I would set myself the goal of describing different architectures that are possible, but without using their industry known names. I have deliberately excluded discussing ‘big data’ or hosting strategies (e.g. cloud, on-premise or hybrid ) as they are too large and ambiguous for this topic, I also think they are outside what I want to tackle.
Design Options
So let’s take a look at some of the architectural options available, all reduced to the following three layers:
Layer 1 is used to temporarily store data that has been extracted from the source. It is modeled in the same manner as the source data is (i.e. the mapping is 1:1).
Layer 2 is where we start storing the data persistently, it can be for use immediately or kept as a contingency for future needs. We have also the opportunity to capture different versions of the data through Change Data Capture, change the normalisation beyond Third Normal Form, if required, and to allow some degree of end user access if deemed appropriate.
Layer 3 is the end game where data is really modeled with the end user in mind. The intricacies of joining tables together has been removed, where possible, so that is intuitive and resembles the processes that exist within the business.
In order to structure my own thinking I have reduced the ‘main’ options to a flow chart. The first thing you’ll notice is I have not included a cost or reward component to the decision making, I will come to that in the conclusion.
Clearly there is little or no room for discussion in Layers 1 and 3, the Business Intelligence (BI) industry are pretty much in agreement what they are for and how they are to be built. Where the hot discussion occurs and people become ‘black and white’ in their opinions relates to Layer 2.
Cost and Reward
The flow chart describes the technical and functional reasons for choosing one Layer 2 approach over another, however, this is when we need to start talking cost and reward(s).
I have taken some serious liberties with this graph, Stephen Few would turn
in his grave. I am mixing measures, have not shown the axis values and created
the underlying data based on my own prejudices. Having said this, these are
the relative magnitudes, the more sophisticated you make layer 2, the greater
the rewards but also the cost to build, this needs to be weighed up against
the savings when a significant change occurs in the source system(s).
Time to Delivery
The final part of this blog is to consider how quickly the business will receive a tangible outcome via Layer 3. Traditional thinking is that if a highly normalised, business wide, Layer 2 is built it means that Layer 3 cannot be built until it is finished. This is a nonsense, there are enough agile delivery techniques available and design options that will ensure that a phase approach can be taken.