0% found this document useful (0 votes)
35 views

Implementing Teradata Utilities in A Powercenter Mapping

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Implementing Teradata Utilities in A Powercenter Mapping

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Implementing Teradata Utilities in a

PowerCenter Mapping

© 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means
(electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and
product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such
owners.
Abstract
Teradata provides utilities, like FastLoad, FastExport, TPump, and MultiLoad, that enable you to
quickly load data into a Teradata database or export data from a Teradata database to a client
application. This article describes some Teradata utilities and provides an example of how to use
them in a PowerCenter mapping.

Supported Versions
 PowerCenter 8.x - 9.5.x

Table of Contents
Abstract ............................................................................................................................................ 2
Supported Versions ......................................................................................................................... 2
Table of Contents ............................................................................................................................ 2
Overview .......................................................................................................................................... 2
Teradata Utilities .............................................................................................................................. 3
FastLoad ...................................................................................................................................... 3
FastExport .................................................................................................................................... 4
TPump.......................................................................................................................................... 4
MultiLoad ...................................................................................................................................... 4
Example: Using TPump and FastLoad in a PowerCenter Mapping ................................................ 4
Script that Determines the Loader Type ...................................................................................... 5
Session Properties ....................................................................................................................... 6
Step 1. Specify the Path and Name of the Parameter File ...................................................... 6
Step 2. Specify the Directory and Name of the Source Flat File .............................................. 6
Step 3. Set the Loader Connection Properties......................................................................... 7
Data Load and Usage of the Loader Log File .............................................................................. 7
Sample Loader Log File ........................................................................................................... 7
Sample Error Table .................................................................................................................. 8
Conclusion ....................................................................................................................................... 8
LoaderConnection_Script.txt ........................................................................................................... 8
Presession_Cmd_Script.txt ........................................................................................................... 11

Overview
Implementing Teradata utilities, like TPump and FastLoad, in some ETL tools can be challenging.
This document describes how to implement Teradata utilities in a PowerCenter mapping.
Informatica supports various enterprise-wide data integration and data quality solutions, including
data warehousing, data migration, data consolidation, data synchronization, data governance,
master data management, and cross-enterprise data integration.
Teradata is a parallel processing system that runs on a share-nothing architecture. The Teradata
database management system scales linearly and predictably based on all dimensions of a

2
database system workload, such as data volume, breadth, number of users, and complexity of
queries.
Teradata acts as a single data store that can accept a large number of concurrent requests from
multiple client applications.
Teradata includes the following key features:
 Unconditional parallelism where several servers share the load distribution.
 Complex ad hoc queries with up to 64 joins.
 Parallel efficiency such that the effort for creating 100 records is same as that for creating
100,000 records.
 Scalability where performance scales linearly when you increase the number of processors.
 No performance degradation when you increase the number of users.

Teradata Utilities
Teradata offers utilities that support the management and maintenance of a data warehouse on a
Teradata database.
Some Teradata utilities include:
 FastLoad
 FastExport
 TPump
 MultiLoad

FastLoad

FastLoad is a parallel-load utility that is designed to move large volumes of data from data
sources on channel-attached and network-attached clients to empty tables in a Teradata
database.
FastLoad typically provides better performance than a standard application program that loads
data to an empty database because data allocation, data conversion, data movement, and data
loading run automatically and in parallel.
Features:
 Performs the initial table load.
 Supports delimited or fixed-width flat file input.
 Processes large data volumes.
Constraints:
 The target table must be empty before the load.
 No read or write access is allowed on the target table while the load occurs.
 Each job can load only one table.

3
FastExport

FastExport is a command-driven utility that uses multiple sessions to quickly transfer large
amounts of data from Teradata tables and views to a client application. You can export data from
any table or view on which you have the SELECT access privilege.
You can export the data to the following destinations:
 A file on your channel-attached or network-attached client system
 An Output Modification (OUTMOD) routine that you wrote to select, validate, and preprocess
the exported data

TPump
TPump is a highly parallel utility designed to continuously move smaller data volumes from data
sources into Teradata tables without locking the affected tables. TPump enables you to load near
real-time data into the data warehouse, which enables you to make decisions based on current,
accurate data. You can use TPump to insert, update, upsert, and delete data in a Teradata
database.
Features:
 Performs fast, scalable, continuous data loads.
 Can load data into the following types of target tables:
- Targets that have secondary indexes, referential integrity, and constraints.
- Targets that are empty or that contain data.

MultiLoad
MultiLoad is a command-driven parallel load utility for high-volume batch maintenance on multiple
tables and views in a Teradata database. It is specially designed for high-speed batch creation
and maintenance of large databases. It supports up to five populated tables and performs block-
level operations on populated tables.

Example: Using TPump and FastLoad in a PowerCenter


Mapping
You have a mapping that writes flat file data to a Teradata target. The mapping uses Expression
transformations to convert strings to dates.

4
The following figure shows the PowerCenter mapping that contains the flat file source and the
Teradata target:

Script that Determines the Loader Type


We can use TPump or FastLoad Teradata utilities to load a Teradata table. Based on the features
of the utilities, TPump is suitable for smaller data volumes and FastLoad is suitable for large data
volumes.
You can use the LoadConnection_Script.txt script to define the loader connection at runtime. The
script counts the rows in the source flat file. If the count exceeds 50,000 rows, the script sets the
loader connection to FastLoad. Otherwise, it sets the loader connection to TPump.
The Loader Connection script specifies a load limit of 50000.
The script uses the following parameters:

Parameter Description

SESSION_NAME Name of the session that runs the mapping.

PARAM_FILE File that stores the loader connection type.

LRF_NAME Name of the input flat file.

TPUMP_CONNECTION Name of the relational connection for TPump.

FLOAD_CONNECTION Name of the relational connection for FastLoad.


The script can be called from a command session or a pre-session-command in the session.
In the Presession_Cmd_Script.text script, $PMRootDir/ScriptFiles/dataload/cr_TD_conn_type.ksh
is the directory and file path of the script that determines the loader connection. The script
contains the code that calculates the number of records in the flat file and sets the load
connection based on the row count.
s_m_ISP_NATIVE_S_ASSET_LRF_INC is the session name. This session runs the mapping.
The session properties are set based on the information in the Session Properties section.

5
$PMRootDir/BWParam/svc/ISP_NATIVE/wf_SVC_ISP_NATIVE_LRF_INC.parm is the directory
and file path of the parameter file. The parameter file dynamically stores the loader connection for
the s_m_ISP_NATIVE_S_ASSET_LRF_INC session based on the cr_TD_conn_type.ksh loader
connection script. The session s_m_ISP_NATIVE_S_ASSET_LRF_INC calls the parameter file to
get the loader connection using the loader connection variable
$LoaderConnection_s_m_ISP_NATIVE_S_ASSET_LRF_INC.
TPUMP_SERVICE_ISP_ETL_01 and FLOAD_SERVICE_ISP_ETL_01 are the TPump and Fload
connections, respectively. The cr_TD_conn_type.ksh script sets the loader connection for the
s_m_ISP_NATIVE_S_ASSET_LRF_INC session in the
$LoaderConnection_s_m_ISP_NATIVE_S_ASSET_LRF_INC variable of the
wf_SVC_ISP_NATIVE_LRF_INC.parm parameter file.
$PMRootDir/TgtFiles/svc/ISP_NATIVE/lrf_S_ASSET.dat is the directory and file path of the flat
file.

Session Properties
Perform the following steps to set the session properties.

Step 1. Specify the Path and Name of the Parameter File


Specify the path and the name of the parameter file in the Properties tab.

Step 2. Specify the Directory and Name of the Source Flat File
Specify the directory and name of the source flat file in the source attribute properties of the
Mapping tab.

6
Step 3. Set the Loader Connection Properties
In the target connection properties, set the type to ‘Loader’ and value to
‘LoaderConnection_{Session name}.’ The loader connection value is fetched from the parameter
file. The parameter file stores the connection value dynamically using a shell script.

Data Load and Usage of the Loader Log File


When you run the session, Teradata generates a Loader log file in the source file directory. It
generates one Loader log file for each Teradata load.
You can use the Loader log file to verify the number of rows read from the source, number of
rows loaded into the target, and number of rows loaded into the error table.
Teradata automatically generates an error table for each load. The PowerCenter Integration
Service loads rejected rows into the error table. The PowerCenter Integration Service also stores
the reason for the rejection in a column of the error table.

Sample Loader Log File


The following sample loader log file shows that 20,000 rows are fetched from the source and 0
rows loaded into the error table:

7
The name of the error table is ERROR_T.ET_S_ASSET0.

Sample Error Table


th
The following query on the error table shows that the 11 row in the source was rejected because
of an invalid timestamp:

Conclusion
This document described some Teradata utilities, described how to implement the TPump and
FastLoad Teradata load utilities in a PowerCenter mapping, and described how to use Loader log
file and error tables.

LoaderConnection_Script.txt
tdate=`date '+%Y-%m-%d %T'`
LOAD_LIMIT=50000
LOG_FILE=$PMRootDir/TgtFiles/svc/ISP_NATIVE/conn_type.log$$

if [ $# -ne 5 ]
then
echo "Usage: cr_TD_conn_type.ksh <SESSION_NAME> <PARAM_FILE_WITH_PATH>
<LRF_NAME_WITH_PATH> <TPUMP_CONNECTION> <FLOAD_CONNECTION>"

8
echo "Give Complete Path for PARAM_FILE and Lrf"
echo "Example: \$PMRootDir/ScriptFiles/dataload/cr_TD_conn_type.ksh
s_m_LRF_to_TD_fcst_plan
\$PMRootDir/BWParam/ebi20/wf_src_HPT_tgt_AUDIT_set_01.parm
\$PMRootDir/TgtFiles/ebi20/fin/tablename.dat TPUMP_HPT_FIN_01 FLOAD_HPT_FIN_01"

echo "Usage: cr_TD_conn_type.ksh <SESSION_NAME> <PARAM_FILE_WITH_PATH>


<LRF_NAME_WITH_PATH> <TPUMP_CONNECTION> <FLOAD_CONNECTION>" >> ${LOG_FILE}
echo "Give Complete Path for PARAM_FILE and Lrf" >> ${LOG_FILE}
echo "Example: \$PMRootDir/ScriptFiles/dataload/cr_TD_conn_type.ksh
s_m_LRF_to_TD_fcst_plan
\$PMRootDir/BWParam/ebi20/wf_src_HPT_tgt_AUDIT_set_01.parm
\$PMRootDir/TgtFiles/ebi20/fin/tablename.dat TPUMP_HPT_FIN_01 FLOAD_HPT_FIN_01"
>> ${LOG_FILE}
exit 1
fi

SESSION_NAME=$1
PARAM_FILE=$2
LRF_NAME=$3
TPUMP_CONNECTION=$4
FLOAD_CONNECTION=$5

LOADER_CONNECTION=LoaderConnection_${SESSION_NAME}
LOG_FILE=$PMRootDir/TgtFiles/svc/ISP_NATIVE/conn_type_${SESSION_NAME}.log
echo > ${LOG_FILE}

if [ -f "${PARAM_FILE}" ]; then
if [ -f "${LRF_NAME}" ]; then
FILE_COUNT=$(echo `wc -l ${LRF_NAME} | awk -F' ' '{ print $1
}'`)
else
echo "Missing LRF File : ${LRF_NAME}. Please check" >>
${LOG_FILE}
exit 1
fi
else
echo "Missing Paramter File : ${PARAM_FILE}. Please check" >>
${LOG_FILE}
exit 1
fi
echo "Count for ${LRF_NAME} = ${FILE_COUNT}" >> ${LOG_FILE}

PARM_FILE_CNT=$(echo `wc -l ${PARAM_FILE} | awk -F' ' '{ print $1 }'`)


echo "Count for PARM_FILE = ${PARM_FILE_CNT}" >> ${LOG_FILE}
if [ ${PARM_FILE_CNT} -eq 0 ]; then
echo "### Parameter file generated for Choosing loader Connection in TD
.................................................................." >>
${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}

9
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "### place holder characters to avoid the occasional error in position
1024 ........................................................." >>
${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
echo "###
...............................................................................
................................................." >> ${PARAM_FILE}
fi

if [ ${FILE_COUNT} -gt ${LOAD_LIMIT} ]; then


CONNECTION_TYPE=${FLOAD_CONNECTION}
else
CONNECTION_TYPE=${TPUMP_CONNECTION}
fi
echo "Based on count CONNECTION_TYPE = ${CONNECTION_TYPE}" >> ${LOG_FILE}

CNT=`cat ${PARAM_FILE} | grep ${LOADER_CONNECTION} | wc -l`


echo "Count of Loader Connection if present in the Param File = $CNT" >>
${LOG_FILE}
PARM_TO_REPLACE=`cat ${PARAM_FILE} | grep ${LOADER_CONNECTION} | awk -F= '{
print $2 }'`
echo "Parameter to replace = ${PARM_TO_REPLACE}" >> ${LOG_FILE}

if [ $CNT -eq 0 ]; then

10
echo "Loader connection ${LOADER_CONNECTION} is not present. Creating
new connection type" >> ${LOG_FILE}
echo "" >> ${PARAM_FILE}
echo "[${SESSION_NAME}]" >> ${PARAM_FILE}
echo "\$${LOADER_CONNECTION}=${CONNECTION_TYPE}" >> ${PARAM_FILE}
else
echo "Loader connection ${LOADER_CONNECTION} already present. Replacing
connection type" >> ${LOG_FILE}
echo "sed
s/${LOADER_CONNECTION}=${PARM_TO_REPLACE}/${LOADER_CONNECTION}=${CONNECTION_TYP
E}/" >> ${LOG_FILE}
sed
s/${LOADER_CONNECTION}=${PARM_TO_REPLACE}/${LOADER_CONNECTION}=${CONNECTION_TYP
E}/ < ${PARAM_FILE} > ${PARAM_FILE}$$
mv ${PARAM_FILE}$$ ${PARAM_FILE}
fi

Presession_Cmd_Script.txt
$PMRootDir/ScriptFiles/dataload/cr_TD_conn_type.ksh
s_m_ISP_NATIVE_S_ASSET_LRF_INC
$PMRootDir/BWParam/svc/ISP_NATIVE/wf_SVC_ISP_NATIVE_LRF_INC.parm
$PMRootDir/TgtFiles/svc/ISP_NATIVE/lrf_S_ASSET.dat TPUMP_SERVICE_ISP_ETL_01
FLOAD_SERVICE_ISP_ETL_01

Authors
Anil Praneeth
Professional Services Consultant

11

You might also like