I am at the tail end of a 2-year Enterprise Data Warehouse project and a reoccurring
question has raised its’ head:
“why bother with a mapping document?”
Purpose
So, let’s look at why people generally want one:
- To Capture the analysis done during the discovery phase. Typically, an Analyst is researching the source data and documenting transformations, DQ issues, joining strategies, etc.
- To provide a specification for Data Engineers to build against. They will focus on the joining strategies, source column to destination column mapping., etc.
- To provide guidance to operational staff when they are diagnosing an issue in production.
When To Use
I accept these justifications in the following cases:
- The source data is very complex, or highly normalised. For example, a CRM system. You don’t want Data Engineers all separately try to reverse engineer a model, each coming up with a different joining strategy.
- The Data Engineers are remote, (e.g. the off-shore model) or they need a very detailed specification due to their level of experience.
- The technology, or approach taken, requires a large amount of complex SQL or script writing that, despite best intentions, is very hard to read.
- Any inbuilt Data Lineage feature within the tool has been rendered useless due to the inclusion of complex SQL or scripting within the data pipeline.
When Not To
I don’t use a mapping document or technology if:
- It is a 1:1 transformation between source and target.
- The tool being used has a Graphical Development Environment that is in effect self-documenting.
- The team contains experienced Data Engineers that are collaborative and have strong analytical skills that can be used for understanding the source.
Conclusion
So what am I suggesting?
- I will own up that mapping documents/technologies are not my default approach, I see them as a necessary evil that is costly, cumbersome and has a short-lived value.
- If you need to invest in Data Mapping, the first question will be the tool. The ones I have used are typically Excel based, prone to sharing issues and often become unwieldy. There are many templates available via Google, or you can start from scratch, just keep it simple! Alternatively, you can invest in a dedicated tool, but be aware that many combine data mapping and data integration. These are probably not suitable for a Business Analyst to use.
- Don’t confuse Data Lineage with Data Mapping. They are similar and with a Mapping Document you can achieve some degree of Data Lineage. Data Mapping is more proactive (e.g. Map these columns to build integration A) whereas Data Lineage is about someone looking at a built integration and asking, “where does this data come from?”.
- Finally, if you do use one, then be prepared to have your heart broken when you discover that ever since the project went live, no one has maintained it or even looked at it. I am yet to find Operational Staff who either see the value or have the time to maintain it.