Untrustworthy External Data

Call it what you will, but a time will eventually come when your customer asks for ‘manually created’ data loaded into the Data Warehouse (DW) from a non-trustworthy source. When they see the look on you face they may feel you are being fussy, after all, they’ve got a spread sheet of monthly figures that gets reported to the board. So it it’s good enough for the board, it should be good enough for the Business Intelligence (BI) team!

So I guess this throws a few questions:

Examples of Manual Data

Firstly, here are a few typical examples of manually created data that often finds its way into the DW:

  • Mapping data to consolidate disparate but equivalent records
  • Grouping values to extend/create a hierarchy that doesn’t exist natively
  • Monthly budgets, annual forecasts and sales targets.

Challenges

There are many, these are a few:

  • The business users who are expected to create the manual data are not trained in the process, and what the consequences of making a change are.
  • The data changes made by a business user are not tagged with appropriate metadata so that they can be audited. Remember, ‘with great power comes great responsibility’, if you are going to let users create data, make sure that you can clearly identify where the business user has done so.
  • It is not always possible to rollback data changes made by a user. What happens if it all goes badly?
  • The manual data can exist outside of normal Information Management (IM) processes within the organisation. It could be residing on a share drive in a Excel format with questionable data quality, poorly managed security and unclear ownership/stewardship.
  • User can change that data without consideration of the impact on the data integration process in place (e.g. a nightly load, month end report cycle, etc.).
  • The manual data can be captured in an ‘unprotected’ Excel/CSV document which does not prevent a business user from making changes that will break the integration. These include:
    • Changing column names
    • Adding or removing columns
    • Adding ‘bad’ data
    • Changing the file name
  • A custom, data capture user interface (UI) is built by a BI resource who is not experienced in doing this. An experienced UI developer should be used who will deliver an intuitive interface that handles maintenance of a single record, bulk maintenance of many records, validation, error handling, filtering, sorting, transactions, etc.

Solutions

The following are a few of the data sources that you may be ‘encouraged’ to use, or wish to propose as options:

  • SharePoint lists
  • Custom Web Page (aspx or similar)
  • Cube with a ‘write-back’ feature
  • Budgeting and forecasting solution
  • Master Data Management solution (MDS or similar)
  • Data capture forms:
    • MS Access
    • Oracle Forms
    • Outlook Forms
  • Etc.

They not sorted in any particular of preference and if implemented correctly can all avoid the challenges above providing they are implemented with the necessary:

  • Documentation
  • Training
  • Change management

I have excluded the use of Excel. This is due to the difficulty in locking the Excel down. It is too easy for users to remove passwords make changes that result in the integration failing.

Leave a Reply

Your email address will not be published. Required fields are marked *