Ferhat's Blog

There will be only one database

Archive for April, 2011

The eagle has landed

Posted by fsengonul on April 19, 2011

After 3 months of planning and logistics, the migration of uncompressed 600 TB’s of data from Europe to Asia has finished last weekend. Now our 2 X2-2 Racks is hosting 4 databases.

Thanks for everybody who has  involved in this project with both their supports and critics.

Advertisements

Posted in Exadata | 4 Comments »

External table to load compressed data residing on remote locations

Posted by fsengonul on April 8, 2011

The PREPROCESSOR option in the external tables  has provided the DWH systems  a lot of opportunities. Below you may find Allen Brumm’s and Mehant Baid’s  notes on how to load remote files via external tables and my minor changes on the script in order to add gunzip support and flexibility for different proxy and target file names.

In our environment , in order to load a single table, 16 sqlldr jobs start from 8 ETL servers. (parallel   & direct path) . The input files are on the ETL servers in gzip compressed format. The ETL tool , on the fly, unzips the files and feeds sqlldr.  The target tables has always 16 hash partitions.  As a result of this design for each table , I have 256 temp segments created on the database side. The difference of the hashing algorithm adds more spice to the situation. And also the network cards on both db nodes and ETL servers are 1 GbE. (do not ask why they’re not 10GbE. They will be.)

The best part of this method is the flexibility. You may choose to run the gunzip command on the target or on the source.In my case running the gunzip on the oracle machine decreases the data flow on the network and also decrease the cpu usage on the etl machine.  I’m still searching for an empty period on both sides in order to collect cpu and network usage statisticks on both cases.

so the steps:

The first thing : passwordless ssh connectivity has to be established between the remote locations and database servers.

Then I’ve changed the script to add support for different proxy and file names. On my case the input file names are all the same but the locations and directory names are different.

The create script for external table:

CREATE TABLE INVOICE.EXT_ABC(  INVxxx   NUMBER(16)    , ...))
ORGANIZATION EXTERNAL(TYPE oracle_loader   DEFAULT DIRECTORY dp_data_dir
ACCESS PARAMETERS(     RECORDS DELIMITED BY NEWLINE
PREPROCESSOR dp_exec_dir:'ora_xtra_pp.sh'
FIELDS TERMINATED BY ':'
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS        (        FIELDS.... ))
LOCATION ('file01.gz','file02.gz','file03.gz','file04.gz','file05.gz','file06.gz','file07.gz','file08.gz',            'file09.gz','file10.gz','file11.gz','file12.gz','file13.gz','file14.gz','file15.gz','file16.gz'            ))
parallel 16;

Mehant Baid’s pdf : ExternalTablesRemoteDataAccess

The mapping file: ora_xtra_map.txt

a.gz:etl_user:10.XXX.XXX.11:/mfs_16way_000/Applications/Datamart/INV_DM/main/a.gz:cat:YES
a:etl_user:10.XXX.XXX.11:/mfs_16way_000/Applications/Datamart/INV_DM/main/a:cat:NO
file01.gz:etl_user:10.XXX.XXX.11:/DIR1/mfs_16way_000/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file02.gz:etl_user:10.XXX.XXX.12:/DIR2/mfs_16way_001/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file03.gz:etl_user:10.XXX.XXX.13:/DIR3/mfs_16way_002/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file04.gz:etl_user:10.XXX.XXX.14:/DIR4/mfs_16way_003/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file05.gz:etl_user:10.XXX.XXX.15:/DIR5/mfs_16way_004/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file06.gz:etl_user:10.XXX.XXX.16:/DIR6/mfs_16way_005/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file07.gz:etl_user:10.XXX.XXX.17:/DIR7/mfs_16way_006/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file08.gz:etl_user:10.XXX.XXX.18:/DIR8/mfs_16way_007/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file09.gz:etl_user:10.XXX.XXX.11:/DIR1/mfs_16way_008/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file10.gz:etl_user:10.XXX.XXX.12:/DIR2/mfs_16way_009/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file11.gz:etl_user:10.XXX.XXX.13:/DIR3/mfs_16way_010/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file12.gz:etl_user:10.XXX.XXX.14:/DIR4/mfs_16way_011/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file13.gz:etl_user:10.XXX.XXX.15:/DIR5/mfs_16way_012/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file14.gz:etl_user:10.XXX.XXX.16:/DIR6/mfs_16way_013/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file15.gz:etl_user:10.XXX.XXX.17:/DIR7/mfs_16way_014/Applications/Datamart/INV_DM/main/input.gz:cat:YES
file16.gz:etl_user:10.XXX.XXX.18:/DIR8/mfs_16way_015/Applications/Datamart/INV_DM/main/input.gz:cat:YES

The modified ora_xtra_pp.sh

#! /bin/sh
PATH=/bin:/usr/bin
export PATH
#set -x
# ora_xtra_pp: ORAcle eXternal Table Remote file Access Pre- Processor
# fs version
#Format for the Map file
#Consists of five fields seperated using a :
#Syntax
#file:rusr:rhost:rdir_file_name:rcmd1:comp
#comp is used for gzipped input files.YES/NO
#
#Example
#foo.dat:abrumm:abrumm-dev:/home/abrumm/xt_Data/foo.dat:cat:NO
#f1.dat:fsengonul:fsengonul-dev:/home/fsengonul/xt_Data/foo.dat:cat:NO
#f1.gz:fsengonul:fsengonul-dev:/home/fsengonul/xt_Data/foo.gz:cat:YES
#this gives the freedom to use same named files on different locations
#
#get filename component of LOCATION, the access driver
#provides the LOCATION as the first argument to the preprocessor
proxy_file_name=`basename $1`
data_dir_name=`dirname $1`
#Flag is set if the file name in the Map file matches the proxy file name
#where our data file is stored
flag_dirf=0
#loops through the map file and fetches details for ssh
#username,hostname and remote directory
file_not_found_err='ora_xtra_pp:
Map file missing.Needed ora_xtra_map.txt in data directory'
if [ -e $data_dir_name/ora_xtra_map.txt ]
then
    while read line
    do
    map_file_name=`echo $line | cut -d: -f1`
    if [ $map_file_name = $proxy_file_name ]
    then
        rdir_file_name=`echo $line | cut -d: -f4`
        rusr=`echo $line | cut -d: -f2`
        rhost=`echo $line | cut -d: -f3`
        rcmd1=`echo $line | cut -d: -f5`
        comp=`echo $line | cut -d: -f6`
        flag_dirf=1
        break
    fi
    done  < $data_dir_name/ora_xtra_map.txt
else
    echo $file_not_found_err 1>&2
    echo $data_dir_name 1>&2
    exit 1
fi
if  [ $flag_dirf = 1 ]
then
    if [ $comp = 'NO' ]
    then
        ssh -q $rusr@$rhost $rcmd1 $rdir_file_name
    fi
    if [ $comp = 'YES' ]
    then
        ssh -q $rusr@$rhost $rcmd1 $rdir_file_name | gunzip -c
    fi
fi

Posted in oracle | 2 Comments »

TROUG Day 2011

Posted by fsengonul on April 8, 2011


 

Türk Oracle Kullanıcıları TROUG gününde buluşuyor!

2010 yılında International Oracle
Users Group Community (IOUC) tarafından kabul görerek kurulan Türk Oracle Kullanıcıları Grubu (http://troug.org), her yıl düzenlemeyi hedeflediği “TROUG Day” etkinliğinin ilki ile 21 Nisan’da Türk Oracle kullanıcılarını buluşturuyor. Katılımın ücretsiz olacağı bu etkinlikte, konusunun uzmanlarından teknik sunumlar ve katılımcıların sorularıyla yönlendireceği “TROUG Panel” oturumu yer alacak. Ayrıca günün sonunda
Bilginç IT akademi tarafından çekilişle birer kişiye Oracle 11g OCP ve Oracle/Sun Java sertifika eğitimleri hediye edilecektir.

Dünyaca ünlü Oracle gurusu Jonathan Lewis da sizlerle birlikte TROUG gününde!

Tüm dünyada Oracle profesyonellerinin yakından takip ettiği, Oracle gurusu Jonathan Lewis, açılış konuşması ve sonrasında “Thinking About Joins” başlıklı
teknik sunumu ile, ayrıca Oracle ACE Kamran Agayev de RMAN konulu bir sunum ile aramızda olacak.

Etkinliğimiz aynı zamanda internet üzerinden canlı yayınlanacak!

Etkinlimize lokasyon ya da kapasite nedenleriyle yerinde katılma fırsatı bulamayanlar için, tüm içerik
internet üzerinden canlı olarak yayınlanacak. Web üzerinden bu etkinliğimizi takip etmek isteyenler icin yayın adresimiz:

http://www.theformspider.com/troug/index.php

21 Nisan’da görüşmek üzere !


Program

 

09:00-10:00

Açılış & Thinking About Joins

Jonathan Lewis

10:00-10:50

20+ Soruda Exadata

Ferhat Şengönül & Hüsnü Şensoy

Kahve Molası

11:00-12:00

Enterprise Manager 11g

Grid Control Gökhan Atıl

Öğle Yemeği Molası

13:00-14:00

Oracle Dataguard: Nasıl

Daha Efektif Kullanırız?

Emre Baransel & Ogan Özdoğan

14:00-14:50

ASM Best Practices

Orhan Bıyıklıoğlu

Kahve Molası

15:00-16:00

11g Backup & Recovery

New Features

Kamran Agayev & Zekeriya Beşiroğlu

16:00-16:50

PL/SQL ile Web 2.0: JavaScript ve Javacılar nasıl kıskançlıktan çatlatılır?

Yalım K. Gerger

Kahve Molası

17:00-18:00

TROUG Panel

H.Tonguç Yılmaz, Kamran Agayev,

O. Yasin Saygılı, Talip Hakan Öztürk, Gökhan Atıl, Emre Baransel

 

 

> ÜCRETSİZ KAYIT OL

Tarih: 21 Nisan Perşembe
Yer: Bahçeşehir
Üniversitesi
 

Beşiktaş Kampüsü – İstanbul


 



Posted in TROUG | Leave a Comment »