100% found this document useful (1 vote)
199 views

Govindarajan Data Vault PDF

The document discusses the Data Vault modeling approach for data warehousing. It introduces Data Vault as an evolution of dimensional modeling techniques. The Data Vault approach uses hub, link, and satellite entities to build a temporal, audit-ready structure. The document provides an example comparing a typical dimensional model to representing the same data using Data Vault components like hub, satellite, and link entities.

Uploaded by

newelljj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
199 views

Govindarajan Data Vault PDF

The document discusses the Data Vault modeling approach for data warehousing. It introduces Data Vault as an evolution of dimensional modeling techniques. The Data Vault approach uses hub, link, and satellite entities to build a temporal, audit-ready structure. The document provides an example comparing a typical dimensional model to representing the same data using Data Vault components like hub, satellite, and link entities.

Uploaded by

newelljj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Vault Modeling

The Next Generation DW Approach

Presented by: Siva Govindarajan

http://www.globytes.com
[email protected]

1
Data Vault Modeling
Agenda
Introduction
Data Vault Place in Evolution
Data Warehousing Architecture
Data Vault Components
Case Study
Typical 3NF / Star Schema
Data Vault Approach
Temporal Data + Auditability
Implementation of Data Vault
Conclusion 2
Introduction

Data Vault concept originally conceived


by Dan Linstedt
Enterprise Data Warehouse Modeling
approach.
Hybrid approach - 3NF & Star Schema
Adaptable to changes
Auditable
Timed right with Technology

3
Data Vault Place in Evolution

1960s - Codd, Date et. al Normal Forms


1980s - Normal Forms adapted to DWs
1985+ - Star Schema for OLAP
1990s - Data Vault concept developed
Dan Linstedt
2000+ - Data Vault concept published by
Dan Linstedt

4
Data Warehousing Architecture

Bill Inmon’s Data Warehouse Architecture

Data Mart
OLTP Staging Enterprise Staging Data Mart
(optional) Data
DataMarts
Marts
Systems & ODS DW
ive
a pt arts
Ad M

Kimball’s Data Warehouse Bus Architecture

Data Mart Data Mart

OLTP
Systems Staging
Data Mart Data Mart

5
Data Vault Components

Hub Entities
Candidate Keys + Load Time + Source
Link Entities
FKs from Hub + Load Time + Source
Satellite Entities
Descriptive Data + Load Time + Source + End Time
Dimension = Hub+ Satellite

Fact = Satellite + Link [+ Hub ]

6
Case Study

Modeling Approaches
Typical 3NF / Star Schema
Data Vault approach
Scenario
Typical Fact and Dimension
New Dimension data
Reference Data change - 1:M to M:M
Additional attributes to Dimension

7
Typical 3 F / Star Schema
1. Typical Fact and Dimension

TR ORDER DIM PRODUCT


tr_order_id dim_product_id
dim_product_id product_id
system order id product_code
system root order id product desc
global root order id product_name
order quantity le ng th
total executed qty
event timestamp

8
Typical 3 F / Star Schema
2. ew Dimension Data

R E F P R OD U CT CA T E GOR Y
TR ORDER DIM PRODUCT p ro d uc t_c a te g o ry _id
tr_order_id dim_product_id
p ro d uc t_c a te g o ry _na me
dim_product_id product_id p ro d uc t_c a te g o ry _c o d e
system order id product_code p ro d uc t_c a te g o ry _d e s c
system root order id product desc is _s tra te g y
global root order id product_name
order quantity p ro d uc t_c a te g o ry _id
total executed qty p ro d uc t_c a te g o ry _na me
event timestamp p ro d uc t_c a te g o ry _c o d e
p ro d uc t_c a te g o ry _d e s c
is _s tra te g y

9
Typical 3 F / Star Schema
2. ew Dimension Data

R E F P R OD U CT CA T E GOR Y
TR ORDER DIM PRODUCT p ro d uc t_c a te g o ry _id
tr_order_id dim_product_id
p ro d uc t_c a te g o ry _na me
dim_product_id product_id p ro d uc t_c a te g o ry _c o d e
system order id product_code p ro d uc t_c a te g o ry _d e s c
system root order id product desc is _s tra te g y
global root order id product_name
order quantity le ng th
total executed qty D IM P R OD U CT CA T E GOR Y
event timestamp
d im_p ro d uc t_c a te g o ry _id d im_p ro d uc t_c a te g o ry _id
product_category_code
product_category_name
product_category_desc
is_strategy
namespace

10
Typical 3 F / Star Schema
3. Reference 1:M to M:M Change

R E F P R OD U CT CA T E GOR Y
TR ORDER DIM PRODUCT p ro d uc t_c a te g o ry _id
tr_order_id dim_product_id
p ro d uc t_c a te g o ry _na me
dim_product_id product_id p ro d uc t_c a te g o ry _c o d e
system order id
system root order id
global root order id
product_code
product desc
product_name
? p ro d uc t_c a te g o ry _d e s c
is _s tra te g y

order quantity le ng th
total executed qty D IM P R OD U CT CA T E GOR Y
event timestamp
d im_p ro d uc t_c a te g o ry _id
order owner firm id ? d im_p ro d uc t_c a te g o ry _id
product_category_code
product_category_name
product_category_desc
is_strategy
namespace

11
Typical 3 F / Star Schema
4. Additional attributes to Dimension

TR ORDER DIM PRODUCT


tr_order_id dim_product_id
dim_product_id product_id
system order id product_code
system root order id product desc
global root order id product_name
order quantity le ng th
total executed qty he ig ht
event timestamp we ig ht
d im_p ro d uc t_c a te g o ry _id p a c k a g ing
order owner firm id c o lo r
s ize

12
Data Vault Approach
1. Typical Hub, Satellite and Link
(Replacing Fact/Dimension)

SAT _PR OD U CT 01
HUB_PRODUCT
s a t_p ro d uc t01_id
hub _p ro d uc t_id
hub _p ro d uc t_id
hub _re c o rd _s o urc e s a t_lo a d _d a te
hub _lo a d _d a te s a t_e nd _d a te
product_id s a t_re c o rd _s o urc e
product_code product desc
product_name
symbol
HUB_ORDER
SA T OR D ER 01
LNK_ORDER hub _o rd e r_id
s a t_o rd e r01_id
lnk _o rd e r_id hub _re c o rd _s o urc e
hub _o rd e r_id
hub _lo a d _d a te
lnk _lo a d _d a te s a t_lo a d _d a te
tr_order_id
lnk _re c o rd _s o urc e s a t_e nd _d a te
system order id
hub _p ro d uc t_id s a t_re c o rd _s o urc e
system root order id
hub _o rd e r_id order quantity
global root order id
total executed qty
event timestamp

13
Data Vault Approach
2. ew Dimension Data
SAT_PRODUCT01
HUB_PRODUCT
s a t_p ro d uc t01_id
hub _p ro d uc t_id
hub _p ro d uc t_id
hub _re c o rd _s o urc e s a t_lo a d _d a te
hub _lo a d _d a te s a t_e nd _d a te
product_id s a t_re c o rd _s o urc e
product_code product desc
product_name
symbol
HUB_ORDER
SAT ORDER01
LNK_ORDER hub _o rd e r_id
s a t_o rd e r01_id
lnk _o rd e r_id hub _re c o rd _s o urc e
hub _o rd e r_id
hub _lo a d _d a te
lnk _lo a d _d a te s a t_lo a d _d a te
tr_order_id
lnk _re c o rd _s o urc e s a t_e nd _d a te
system order id
hub _p ro d uc t_id s a t_re c o rd _s o urc e
system root order id
hub _o rd e r_id order quantity
global root order id
total executed qty
event timestamp

S A T _CA T E GOR Y 01
H U B _CA T E GOR Y s a t_c a te g o ry 01_id
LN K_P R OD U CT _X_CA T E GOR Y
hub _c a te g o ry _id hub _c a te g o ry _id
lnk _p ro d uc t_x_c a te g o ry _id
s a t_lo a d _d a te
hub _re c o rd _s o urc e
lnk _lo a d _d a te s a t_e nd _d a te
hub _lo a d _d a te
lnk _re c o rd _s o urc e s a t_re c o rd _s o urc e
product_category_id
hub _p ro d uc t_id product_category_name
product_category_code
hub _c a te g o ry _id product_category_desc
is_strategy

14
Data Vault Approach
3. Reference 1:M to M:M Change
SAT_PRODUCT01
HUB_PRODUCT
s a t_p ro d uc t01_id
hub _p ro d uc t_id
hub _p ro d uc t_id
hub _re c o rd _s o urc e s a t_lo a d _d a te
hub _lo a d _d a te s a t_e nd _d a te
product_id s a t_re c o rd _s o urc e
product_code product desc
product_name
symbol
HUB_ORDER
SAT ORDER01
LNK_ORDER hub _o rd e r_id
s a t_o rd e r01_id
lnk _o rd e r_id hub _re c o rd _s o urc e
hub _o rd e r_id
hub _lo a d _d a te
lnk _lo a d _d a te s a t_lo a d _d a te
tr_order_id
lnk _re c o rd _s o urc e s a t_e nd _d a te
system order id
hub _p ro d uc t_id s a t_re c o rd _s o urc e
system root order id
hub _o rd e r_id order quantity
global root order id
total executed qty
event timestamp

S A T _CA T E GOR Y 01
H U B _CA T E GOR Y s a t_c a te g o ry 01_id
LN K_P R OD U CT _X_CA T E GOR Y
hub _c a te g o ry _id hub _c a te g o ry _id
lnk _p ro d uc t_x_c a te g o ry _id
s a t_lo a d _d a te
hub _re c o rd _s o urc e
lnk _lo a d _d a te s a t_e nd _d a te
hub _lo a d _d a te
lnk _re c o rd _s o urc e s a t_re c o rd _s o urc e
product_category_id
hub _p ro d uc t_id product_category_name
product_category_code
hub _c a te g o ry _id product_category_desc
is_strategy

15
Data Vault Approach
4. Additional attributes to Dimension

SAT_PRODUCT01
s a t_p ro d uc t01_id
hub _p ro d uc t_id
s a t_lo a d _d a te
s a t_e nd _d a te
s a t_re c o rd _s o urc e
product desc
HUB_PRODUCT product_name

hub _p ro d uc t_id
S A T _P R OD U CT 02
hub _re c o rd _s o urc e
hub _lo a d _d a te s a t_p ro d uc t02_id
product_id
hub _p ro d uc t_id
product_code
s a t_lo a d _d a te
s a t_e nd _d a te
s a t_re c o rd _s o urc e
length
height
weight
packaging
color

16
Temporal Data + Auditability
S AT _PR OD U CT 01
s a t_p ro d uc t01_id

hub _p ro d uc t_id
s a t_lo a d _d a te
s a t_e nd _d a te
s a t_re c o rd _s o urc e
product desc
HUB_PRODUCT product_name

hub _p ro d uc t_id
S AT _PR OD U CT 02
hub _re c o rd _s o urc e
hub _lo a d _d a te s a t_p ro d uc t02_id
product_id
hub _p ro d uc t_id
product_code
s a t_lo a d _d a te
s a t_e nd _d a te
s a t_re c o rd _s o urc e
length
height
weight
packaging
color

LNK_PRODUCT_X_CATEGORY
LNK_ORDER HUB_ORDER S AT OR D ER 01
lnk _p ro d uc t_x_c a te g o ry _id
lnk _o rd e r_id hub _o rd e r_id s a t_o rd e r01_id
lnk _lo a d _d a te
lnk _lo a d _d a te hub _re c o rd _s o urc e hub _o rd e r_id
lnk _re c o rd _s o urc e
lnk _re c o rd _s o urc e hub _lo a d _d a te s a t_lo a d _d a te
hub _p ro d uc t_id
hub _o rd e r_id tr_order_id s a t_e nd _d a te
hub _c a te g o ry _id
hub _p ro d uc t_id system order id s a t_re c o rd _s o urc e
hub _c a te g o ry _id system root order id order quantity
global root order id total executed qty
event timestamp
source_system

SA T _CAT E GOR Y01


HUB_CATEGORY s a t_c a te g o ry 01_id
hub _c a te g o ry _id hub _c a te g o ry _id
s a t_lo a d _d a te
hub _re c o rd _s o urc e
s a t_e nd _d a te
hub _lo a d _d a te
s a t_re c o rd _s o urc e
product_category_id
product_category_name
product_category_code
product_category_desc
is_strategy

17
Implementation of Data Vault

Make it Simple
View for Dimension
View For Fact
Data Loading
Take advantage of Technology
DW Appliances
High-throughput storage devices
RDBMS Features

18
Make it Simple

Create views for Dimension

From our Demo model, Join Product related Hubs and


Satellites to build View for Product Dimension as shown
below :

HUB_PRODUCT

SAT_PRODUCT01

SAT_PRODUCT02

19
Make it Simple

Create Views For Fact

Join Order related Hubs, Satellites and Links to build


View for ORDER Fact using ORDER related Hubs/
Satellites and Links.

HUB_ORDER
SAT_ORDER01
LNK_ORDER

20
Data Loading
Typically data loading for a Data Vault is in the
following sequence.
Hubs for Dimensions
Links for Dimensions
Satellites for Dimensions
Hubs for Fact (if any )
Links for Fact
Satellites for Fact
Refer to Data Vault Series article 5 of Linstedt for
further details

21
DW Appliances

Next wave of IT Data Warehouse hardware solutions are


the Data Warehouse Appliances. Take advantage of the
technology where possible. Few DW Appliances in the
market :
Oracle Exadata

Netezza

Teradata

RedBrick

GreenPlum

22
High-throughput storage devices

Utilize Storage devices designed for DW / OLAP


applications. Following are some examples only and
would change over time. Please do your research based
on your sizing requirements.
EMC CLARiiON Storage family

Texas Memory systems RamSan – Solid State


devices
Hitachi USP / VSP Storage family

Other Hybrid solutions with High performance


storage and Solid State devices

23
RDBMS Features

Some Oracle Features catering DW / OLAP


applications:
Exadata Smart Scans

OLAP Based Materialized views

Partitioning Reference partition

Advanced data compression

Automatic degree of parallelism (ADOP).

Star query optimization

Oracle OLAP/DWH Features.

24
Conclusion
In this presentation we have addressed high level overview of Data
Vault Architecture. Topics discussed are
Data Vault concept and architecture

Data Vault Components such as Hubs, Satellites and Link


tables
Typical modeling challenges with traditional modeling
approaches
How those challenges could be handled using Data Vault
Modeling Approach.
Auditing and Temporal data capture using DV Approach.

And finally, some implementation details

If interested in learning more, try


Dan Linstedt's special coaching area at:
http://danLinstedt.com/my-coach.

25
Questions ?

? 26
References

Data Vault Series articles by Dan E. Linstedt --


http://www.tdan.com
Referred few articles from Genesee Academy --
http://geneseeacademy.com
Articles and books by Ralph Kimball and Bill Inmon
from multiple sources.
Technical Documents from
http://www.oracle.com/technetwork

27
Special Thanks to

Dan Linstedt for review and corrections to the article.


My colleague Mark Bruscke for his assistance in preparing
the article.

28
The End

Contact Email :
[email protected]

29

You might also like