Being a business intelligence junkie, I read most everything I can on
the subject. The item that I find most hilarious is "Do you have
multiple single versions of truth?". My answer is "Yes, I do".
The reason I find this so hilarious is the background of the question.
The background goes like this.
Let's say you are the CEO of an insurance company. You ask your VP of
Sales and VP of Finance what the sales were for last quarter. Each of
the VPs goes off to their respective systems and gladly reports back.
The VP of Sales proclaims that we had $34 million in sales last quarter.
The VP of Finance reports that we had $27 million in sales last
quarter. Which one is right?
This is one of the reasons that you need a data warehouse. You need to
have a single version of truth. But you can have multiple versions of
truth.
I submit to you that both of the answers are correct and that neither is
correct. Before you start thinking that I am loosing my mind (although
my wife would agree with you!), let's think about this.
It may be that in Finance (perhaps the more conservative group), sales
are not truly sales unless the risk has been accepted by the company and
the policy is in force (effective date of the policy has passed). In
Sales, sales may be defined as the collection of money at the
application for insurance, since that is what commissions are based on.
Both are equally valid definitions. Both are equally correct.
However, what is the CEO's definition of sales? Maybe the CEO thinks
sales are defined as the collection of the money, acceptance of the risk
– whether or not the policy is in force. In this case, both are equally
valid definitions, but both are equally incorrect.
So what do we do?
I create three columns (for this example, but believe me it can be
worse) - FINANCE_DEFINITION_OF_SALES, SALES_DEFINITION_OF_SALES,
MGMT_DEFINITION_OF_SALES. Then, I create the metadata to reflect how
each of the columns are defined and created. This does three things.
1) It provides each group with their meaning of what a "sale" is,
2) It allows IT and management see that there may be a problem with the
organization if there isn't agreement on what a "sale" is, and
3) It makes IT look like heroes because we created what everyone wants.
It may not be good, but it is what they want.
So there are many versions of truth when it comes to data and we need to
reflect this in our data warehouse. But the data warehouse is the
single version of truth, because it contains the data as each group
defines it.