I am at the tail end of a 2-year Enterprise Data Warehouse project and a reoccurring
question has raised its’ head:
“why bother with a mapping document?”
So, let’s look at why people generally want one:
- To Capture the analysis done during the discovery phase. Typically,
an Analyst is researching the source data and documenting transformations,
DQ issues, joining strategies, etc.
- To provide a specification for Data Engineers to build against. They
will focus on the joining strategies, source column to destination column
- To provide guidance to operational staff when they are diagnosing an
issue in production.
I accept these justifications in the following cases:
- The source data is very complex, or highly normalised. For example,
a CRM system. You don’t want Data Engineers all separately try to reverse
engineer a model, each coming up with a different joining strategy.
- The Data Engineers are remote, (e.g. the off-shore model) or they need
a very detailed specification due to their level of experience.
- The technology, or approach taken, requires a large amount of complex
SQL or script writing that, despite best intentions, is very hard to read.
- Any inbuilt Data Lineage feature within the tool has been rendered useless
due to the inclusion of complex SQL or scripting within the data pipeline.
I don’t use a mapping document or technology if:
- It is a 1:1 transformation between source and target.
- The tool being used has a Graphical Development Environment that is
in effect self-documenting.
- The team contains experienced Data Engineers that are collaborative
and have strong analytical skills that can be used for understanding the
So what am I suggesting?
- I will own up that mapping documents/technologies are not my default
approach, I see them as a necessary evil that is costly, cumbersome and
has a short-lived value.
- If you need to invest in Data Mapping, the first question will be the
tool. The ones I have used are typically Excel based, prone to sharing issues
and often become unwieldy. There are many templates available via Google,
or you can start from scratch, just keep it simple! Alternatively, you can
invest in a dedicated tool, but be aware that many combine data mapping
and data integration. These are probably not suitable for a Business Analyst
- Don’t confuse Data Lineage with Data Mapping. They are similar and with
a Mapping Document you can achieve some degree of Data Lineage. Data Mapping
is more proactive (e.g. Map these columns to build integration A) whereas
Data Lineage is about someone looking at a built integration and asking,
“where does this data come from?”.
- Finally, if you do use one, then be prepared to have your heart broken
when you discover that ever since the project went live, no one has maintained
it or even looked at it. I am yet to find Operational Staff who either see
the value or have the time to maintain it.