DATA WAREHOUSING BASICS

Best online resource for Data Warehousing Basics Tutorial Tutorials

Big Data Analytics

9:04 AM

Data Presentation

The data presentation area is where data is organized, stored, and made available for direct querying by users, report writers, and other analytical applications.Since the backroom staging area is off-limits, the presentation area is the data warehouse as far as the business community is concerned. It is all the business community sees and touches via data access tools. The pre release working title for the first edition of The Data Warehouse Toolkit originally was Getting the Data Out. This is what the presentation area with its dimensional models is all about.

We typically refer to the presentation area as a series of integrated data marts.A data mart is a wedge of the overall presentation area pie. In its most simplistic form, a data mart presents the data from a single business process. These business processes cross the boundaries of organizational functions.We have several strong opinions about the presentation area. First of all, we insist that the data be presented, stored, and accessed in dimensional schemas. Fortunately, the industry has matured to the point where we’re no longer debating this mandate. The industry has concluded that dimensional modeling is the most viable technique for delivering data to data warehouse users.

Dimensional modeling is a new name for an old technique for making databases simple and understandable. In case after case, beginning in the 1970s, IT organizations, consultants, end users, and vendors have gravitated to a simple dimensional structure to match the fundamental human need for simplicity. Imagine a chief executive officer (CEO) who describes his or her business as, “We sell products in various markets and measure our performance over time.” As dimensional designers, we listen carefully to the CEO’s emphasis on product, market, and time. Most people find it intuitive to think of this business as a cube of data, with the edges labeled product, market, and time. We can imagine slicing and dicing along each of these dimensions. Points inside
the cube are where the measurements for that combination of product, market,and time are stored. The ability to visualize something as abstract as a set of data in a concrete and tangible way is the secret of understandability. If this perspective seems too simple, then good! A data model that starts by being simple has a chance of remaining simple at the end of the design.

A model that starts by being complicated surely will be overly complicated at the end. Overly complicated models will run slowly and be rejected by business users.
Dimensional modeling is quite different from third-normal-form (3NF) modeling. 3NF modeling is a design technique that seeks to remove data redundancies. Data is divided into many discrete entities, each of which becomes a table in the relational database. A database of sales orders might start off with a record for each order line but turns into an amazingly complex spiderweb diagram as a 3NF model, perhaps consisting of hundreds or even thousands of normalized tables.


The industry sometimes refers to 3NF models as ER models. ER is an acronym for entity relationship. Entity-relationship diagrams (ER diagrams or ERDs) are drawings of boxes and lines to communicate the relationships between tables. Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key difference between 3NF and dimensional models is the degree of normalization.

Since both model types can be presented as ERDs, we’ll refrain from referring to 3NF models as ER models; instead, we’ll call them normalized models to minimize confusion. Normalized modeling is immensely helpful to operational processing performance because an update or insert transaction only needs to touch the database in one place. Normalized models, however, are too complicated for data warehouse queries. Users can’t understand, navigate, or remember normalized
models that resemble the Los Angeles freeway system. Likewise, relational database management systems (RDBMSs) can’t query a normalized model efficiently; the complexity overwhelms the database optimizers, resulting in disastrous performance. The use of normalized modeling in the data warehouse presentation area defeats the whole purpose of data warehousing, namely, intuitive and high-performance retrieval of data.