Ferhat's Blog

There will be only one database

Getting the most from hybrid columnar compression

Posted by fsengonul on August 9, 2010

I wasn’t very willing to make tests on the effects of sort order to the compression ratio in hybrid columnar compression. Having consequent same values should increase the compress ratio but we could only sort the data in order of just one column.  I thought that the gain could not be significant.  But after the tests it’s seen that nearly half of the disk space can be saved, if you have an idea on the structure of your data.

You may find the avg_col_length and number of distinct values  (NoDV) from DBA_TAB_COL_STATISTICS or DBA_PART_COL_STATISTICS . A column with high average length and reasonable distinct values is a good candidate. (There is no meaning to use NoDV 1 or the primary key for sorting)

Here are my results:

137 GB has reduced to 21,21 GB with query_high compression without sorting. If I sort on column A the size reduced to 12,18 GB and if I sort on column A and column B together the size was 11,64 GB which is 45 % better than the unsorted one.

It would be much more useful if  you have the chance to sort the data before entering your database; on ETL machine for example.

SORT COMPRESSION SINGLE TABLE
NOSORT NOCOMP 137,59 GB
NOSORT QUERY_HIGH 21,21 GB
SORT_A QUERY_HIGH 12,18 GB
SORT_B QUERY_HIGH 15,37 GB
SORT_A_B QUERY_HIGH 11,64 GB
About these ads

6 Responses to “Getting the most from hybrid columnar compression”

  1. kocakahin said

    Great post Ferhat. Thanks for your share.
    Normally QUERY HIGH compression performs an internal sorting within the compression chunk size (it is defind on the fly). But it seems that the table has a few distinct values on A & B columns so that you have this positive effect on compression ratios.
    If you can just try the same with QUERY LOW you might see a larger gap between compressed & uncompressed loads. And guess what ?!? :)

  2. kocakahin said

    Sorry for using the wrong idiom “a few distinct”. Let me correct it with “a few distinct values compared to number of total rows”

  3. [...] Мигрировали с Sun M9000 (процессоры Sun Sparc 7 2.52 GHz) на Oracle Exadata V2 (Intel Xeon® E5540 2.53 GHz), использование Exadata Hybrid Columnar Compression позволило снизить объём данных со 100 (10g compressed) до 25 ТБ (HCC, ссылка по теме Getting the most from hybrid columnar compression) [...]

  4. [...] Getting the most from hybrid columnar compression August 20102 comments 3 [...]

  5. Dmitry said

    In OLAP task. I have 30GB table without compression, 15 coluns, after sorting and compression 14 columns size is 760 Mb

  6. fsengonul said

    Today I’ve tried the same method on a pure CDR table. The results are even better:

    SEGMENT_NAME ROUND(BYTES/1024/1024/1024,2)
    NOCOMPRESS…………………6,45
    COMPRESS_FOR_OLTP……….4,41
    COMPRESS_FOR_QH………….1,33
    COMP_QH_WITH_SORT_A_B…0,46

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 160 other followers

%d bloggers like this: