0% found this document useful (0 votes)
6 views

ETL-PROCESS

The document provides an overview of the Extract, Transform, Load (ETL) process, which integrates and organizes data from various sources into a consistent dataset for storage and analysis. It discusses the evolution of ETL from the 1970s to the rise of cloud computing, highlighting its benefits such as automation, improved data quality, and faster decision-making. Additionally, it outlines how ETL works in three main steps—extracting data, transforming it for analytical use, and loading it into a target data warehouse—along with examples of popular ETL tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ETL-PROCESS

The document provides an overview of the Extract, Transform, Load (ETL) process, which integrates and organizes data from various sources into a consistent dataset for storage and analysis. It discusses the evolution of ETL from the 1970s to the rise of cloud computing, highlighting its benefits such as automation, improved data quality, and faster decision-making. Additionally, it outlines how ETL works in three main steps—extracting data, transforming it for analytical use, and loading it into a target data warehouse—along with examples of popular ETL tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

R I H C W E L L C O L L E G E S I NC .

EXTRACT,
TRANSFORM,
LOAD (ETL)
PREPARED BY: MR. MARCO A. CRUZ
W H AT I S E T L ?
A data integration process that
combines, cleans and organizes data
from multiple sources into a single,
consistent data set for storage in a
data warehouse, data lake or other
target system
EVOLUTION OF ETL
Businesses have been
generating data since the age of
the abacus, but modern analytics
only became possible with the
arrival of the digital computer
and data storage.
EVOLUTION OF ETL
A major step forward arrived in the
1970s, with a move to larger
centralized databases. ETL was then
introduced as a process for
integrating and loading data for
computation and analysis,
eventually becoming the primary
method to process data for data
warehousing projects.
EVOLUTION OF ETL
The era of big data arrived in the
1990s as computing speeds and storage
capacity continued to grow rapidly, with
large volumes of data being pulled from
new sources, such as social media and
the Internet of Things (IoT). A limiting
factor remained, with data often stored
in on-premises data warehouses.
EVOLUTION OF ETL
The next major step in both
computing and ETL was cloud
computing, which became popular in
the late 1990s. Using data warehouses
such as Amazon Web Services (AWS),
Microsoft Azure and Snowflake, data can
now be accessed from around the globe
and quickly scaled to enable ETL
solutions to deliver remarkable detailed
insights and new-found competitive
advantage
BENEFITS OF USING ETL
 It automates the data integration
process
 It reduces the time and effort required
to build and maintain data pipelines
 It ensures data accuracy and
consistency
 It improves data quality
 It enables faster decision-making.
R I H C W E L L C O L L E G E S I NC .

HOW ETL
WORKS?
PREPARED BY: MR. MARCO A. CRUZ
EXTRACT
During data extraction, raw data is
copied or exported from source
locations to a staging area. Data
management teams can extract data
from a variety of different sources,
which can be structured or
unstructured. Those data types include,
but are not limited to:

SQL or NoSQL servers, CRM and


ERP systems, Email, Web pages
TRANSFORM
In the staging area, the raw data
undergoes data processing. Here, the data is
transformed and consolidated for its intended
analytical use case. This phase of the
transformation process can include:

 Filtering, cleansing, aggregating, de-duplicating,


validating and authenticating the data.

 Performing calculations, translations or


summarizations based on the raw data. This can
include changing row and column headers for
consistency, converting currencies or other units of
measurement, editing text strings and more.
TRANSFORM

 Conducting audits to ensure data quality and


compliance, and computing metrics

 Removing, encrypting or protecting data governed


by industry or governmental regulators

 Formatting the data into tables or joined tables to


match the schema of the target data warehouse.
LOAD
In this last step, the transformed data is
moved from the staging area into a target
data warehouse. Typically, this involves an
initial loading of all data, followed by
periodic loading of incremental data
changes and, less often, full refreshes to
erase and replace data in the warehouse.
For most organizations that use ETL, the process
is automated, well-defined, continuous and
batch-driven. Typically, the ETL load process
takes place during off-hours when traffic on the
source systems and the data warehouse is at its
lowest.
I L L U S T R AT I O N O F E T L
R I H C W E L L C O L L E G E S I NC .

ETL TOOLS

PREPARED BY: MR. MARCO A. CRUZ


ETL TOOLS
In the past, organizations wrote their
own ETL code. There are now many
open source and commercial ETL tools
and cloud-based services to choose
from. Typical capabilities of these
products include:

 Comprehensive automation and ease of use -


Leading ETL tools automate the entire data flow,
from data sources to the target data warehouse.
This saves data engineers from the tedious tasks
of moving and formatting data—for faster results
and more efficient operations.
ETL TOOLS
 A visual, drag-and-drop interface - This
functionality can be used for specifying rules and
data flows.

 Support for complex data management-This


includes assistance with complex calculations,
data integrations and string manipulations.

 Security and compliance - The best ETL tools


encrypt data both in motion and at rest, and are
certified compliant with industry or government
regulations.

In addition, many ETL tools have evolved to


include ELT capability and to support integration of
real-time and streaming data for artificial intelligence
(AI) applications.
ETL TOOLS
 Informatica PowerCenter
 Apache Airflow
 IBM Infosphere Datastage
 Oracle Data Integrator
 Microsoft SQL Server Integration Services
(SSIS)
 Talend Open Studio (TOS)
 Pentaho Data Integration (PDI)
 Hadoop
 AWS Glue
 AWS Data Pipeline
R I H C W E L L C O L L E G E S I NC .

THANK YOU!
PREPARED BY: MR. MARCO A. CRUZ

You might also like