Ferhat's Blog

There will be only one database

External table to load compressed data residing on remote locations

Posted by fsengonul on April 8, 2011

The PREPROCESSOR option in the external tables  has provided the DWH systems  a lot of opportunities. Below you may find Allen Brumm’s and Mehant Baid’s  notes on how to load remote files via external tables and my minor changes on the script in order to add gunzip support and flexibility for different proxy and target file names.

In our environment , in order to load a single table, 16 sqlldr jobs start from 8 ETL servers. (parallel   & direct path) . The input files are on the ETL servers in gzip compressed format. The ETL tool , on the fly, unzips the files and feeds sqlldr.  The target tables has always 16 hash partitions.  As a result of this design for each table , I have 256 temp segments created on the database side. The difference of the hashing algorithm adds more spice to the situation. And also the network cards on both db nodes and ETL servers are 1 GbE. (do not ask why they’re not 10GbE. They will be.)

The best part of this method is the flexibility. You may choose to run the gunzip command on the target or on the source.In my case running the gunzip on the oracle machine decreases the data flow on the network and also decrease the cpu usage on the etl machine.  I’m still searching for an empty period on both sides in order to collect cpu and network usage statisticks on both cases.

so the steps:

The first thing : passwordless ssh connectivity has to be established between the remote locations and database servers.

Then I’ve changed the script to add support for different proxy and file names. On my case the input file names are all the same but the locations and directory names are different.

The create script for external table:

CREATE TABLE INVOICE.EXT_ABC(  INVxxx   NUMBER(16)    , ...))
ORGANIZATION EXTERNAL(TYPE oracle_loader   DEFAULT DIRECTORY dp_data_dir
ACCESS PARAMETERS(     RECORDS DELIMITED BY NEWLINE
PREPROCESSOR dp_exec_dir:'ora_xtra_pp.sh'
FIELDS TERMINATED BY ':'
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS        (        FIELDS.... ))
LOCATION ('file01.gz','file02.gz','file03.gz','file04.gz','file05.gz','file06.gz','file07.gz','file08.gz',            'file09.gz','file10.gz','file11.gz','file12.gz','file13.gz','file14.gz','file15.gz','file16.gz'            ))
parallel 16;

Mehant Baid’s pdf : ExternalTablesRemoteDataAccess

The mapping file: ora_xtra_map.txt

a.gz:etl_user:10.XXX.XXX.11:/mfs_16way_000/Applications/Datamart/INV_DM/main/a.gz:cat:YES
a:etl_user:10.XXX.XXX.11:/mfs_16way_000/Applications/Datamart/INV_DM/main/a:cat:NO
file01.gz:etl_user:10.XXX.XXX.11:/DIR1/mfs_16way_000/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file02.gz:etl_user:10.XXX.XXX.12:/DIR2/mfs_16way_001/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file03.gz:etl_user:10.XXX.XXX.13:/DIR3/mfs_16way_002/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file04.gz:etl_user:10.XXX.XXX.14:/DIR4/mfs_16way_003/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file05.gz:etl_user:10.XXX.XXX.15:/DIR5/mfs_16way_004/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file06.gz:etl_user:10.XXX.XXX.16:/DIR6/mfs_16way_005/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file07.gz:etl_user:10.XXX.XXX.17:/DIR7/mfs_16way_006/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file08.gz:etl_user:10.XXX.XXX.18:/DIR8/mfs_16way_007/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file09.gz:etl_user:10.XXX.XXX.11:/DIR1/mfs_16way_008/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file10.gz:etl_user:10.XXX.XXX.12:/DIR2/mfs_16way_009/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file11.gz:etl_user:10.XXX.XXX.13:/DIR3/mfs_16way_010/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file12.gz:etl_user:10.XXX.XXX.14:/DIR4/mfs_16way_011/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file13.gz:etl_user:10.XXX.XXX.15:/DIR5/mfs_16way_012/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file14.gz:etl_user:10.XXX.XXX.16:/DIR6/mfs_16way_013/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file15.gz:etl_user:10.XXX.XXX.17:/DIR7/mfs_16way_014/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file16.gz:etl_user:10.XXX.XXX.18:/DIR8/mfs_16way_015/Applications/Datamart/INV_DM/main/input.gz:cat:YES

The modified ora_xtra_pp.sh

#! /bin/sh
PATH=/bin:/usr/bin
export PATH
#set -x
# ora_xtra_pp: ORAcle eXternal Table Remote file Access Pre- Processor
# fs version
#Format for the Map file
#Consists of five fields seperated using a :
#Syntax
#file:rusr:rhost:rdir_file_name:rcmd1:comp
#comp is used for gzipped input files.YES/NO
#
#Example
#foo.dat:abrumm:abrumm-dev:/home/abrumm/xt_Data/foo.dat:cat:NO
#f1.dat:fsengonul:fsengonul-dev:/home/fsengonul/xt_Data/foo.dat:cat:NO
#f1.gz:fsengonul:fsengonul-dev:/home/fsengonul/xt_Data/foo.gz:cat:YES
#this gives the freedom to use same named files on different locations
#
#get filename component of LOCATION, the access driver
#provides the LOCATION as the first argument to the preprocessor
proxy_file_name=`basename $1`
data_dir_name=`dirname $1`
#Flag is set if the file name in the Map file matches the proxy file name
#where our data file is stored
flag_dirf=0
#loops through the map file and fetches details for ssh
#username,hostname and remote directory
file_not_found_err='ora_xtra_pp:
Map file missing.Needed ora_xtra_map.txt in data directory'
if [ -e $data_dir_name/ora_xtra_map.txt ]
then
    while read line
    do
    map_file_name=`echo $line | cut -d: -f1`
    if [ $map_file_name = $proxy_file_name ]
    then
        rdir_file_name=`echo $line | cut -d: -f4`
        rusr=`echo $line | cut -d: -f2`
        rhost=`echo $line | cut -d: -f3`
        rcmd1=`echo $line | cut -d: -f5`
        comp=`echo $line | cut -d: -f6`
        flag_dirf=1
        break
    fi
    done  < $data_dir_name/ora_xtra_map.txt
else
    echo $file_not_found_err 1>&2
    echo $data_dir_name 1>&2
    exit 1
fi
if  [ $flag_dirf = 1 ]
then
    if [ $comp = 'NO' ]
    then
        ssh -q $rusr@$rhost $rcmd1 $rdir_file_name
    fi
    if [ $comp = 'YES' ]
    then
        ssh -q $rusr@$rhost $rcmd1 $rdir_file_name | gunzip -c
    fi
fi

2 Responses to “External table to load compressed data residing on remote locations”

  1. congrats ferhat brother,your blog on oracle blog :)

  2. Another solution to Mehant Baid’s problem of not being able to install NFS on the remote host may be to mount a remote directory locally using SSHFS.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 160 other followers

%d bloggers like this: