DATA WAREHOUSING BASICS

Best online resource for Data Warehousing Basics Tutorial Tutorials

Big Data Analytics

9:08 AM

Metadata

Metadata is all the information in the data warehouse environment that is not
the actual data itself. Metadata is akin to an encyclopedia for the data warehouse.
Data warehouse teams often spend an enormous amount of time talking about, worrying about, and feeling guilty about metadata. Since most developers have a natural aversion to the development and orderly filing of documentation, metadata often gets cut from the project plan despite everyone’s acknowledgment that it is important.

Metadata comes in a variety of shapes and forms to support the disparate needs of the data warehouse’s technical, administrative, and business user groups. We have operational source system metadata including source schemas and copybooks that facilitate the extraction process. Once data is in the staging area, we encounter staging metadata to guide the transformation and loading processes, including staging file and target table layouts, transformation and cleansing rules, conformed dimension and fact definitions, aggregation definitions, and ETL transmission schedules and run-log results.

Even the custom programming code we write in the data staging area is metadata. Metadata surrounding the warehouse DBMS accounts for such items as the system tables, partition settings, indexes, view definitions, and DBMS-level security privileges and grants. Finally, the data access tool metadata identifies business names and definitions for the presentation area’s tables and columns as well as constraint filters, application template specifications, access and usage statistics, and other user documentation. And of course, if we haven’t included it already, don’t forget all the security settings, beginning with source transactional data and extending all the way to the user’s desktop.

The ultimate goal is to corral, catalog, integrate, and then leverage these disparate
varieties of metadata, much like the resources of a library. Suddenly, the effort to build dimensional models appears to pale in comparison. However, just because the task looms large, we can’t simply ignore the development of a metadata framework for the data warehouse. We need to develop an overall metadata plan while prioritizing short-term deliverables, including the purchase or construction of a repository for keeping track of all the metadata.