Getting the most from hybrid columnar compression
Posted by fsengonul on August 9, 2010
I wasn’t very willing to make tests on the effects of sort order to the compression ratio in hybrid columnar compression. Having consequent same values should increase the compress ratio but we could only sort the data in order of just one column. I thought that the gain could not be significant. But after the tests it’s seen that nearly half of the disk space can be saved, if you have an idea on the structure of your data.
You may find the avg_col_length and number of distinct values (NoDV) from DBA_TAB_COL_STATISTICS or DBA_PART_COL_STATISTICS . A column with high average length and reasonable distinct values is a good candidate. (There is no meaning to use NoDV 1 or the primary key for sorting)
Here are my results:
137 GB has reduced to 21,21 GB with query_high compression without sorting. If I sort on column A the size reduced to 12,18 GB and if I sort on column A and column B together the size was 11,64 GB which is 45 % better than the unsorted one.
It would be much more useful if you have the chance to sort the data before entering your database; on ETL machine for example.
| SORT | COMPRESSION | SINGLE TABLE |
| NOSORT | NOCOMP | 137,59 GB |
| NOSORT | QUERY_HIGH | 21,21 GB |
| SORT_A | QUERY_HIGH | 12,18 GB |
| SORT_B | QUERY_HIGH | 15,37 GB |
| SORT_A_B | QUERY_HIGH | 11,64 GB |

kocakahin said
Great post Ferhat. Thanks for your share.
Normally QUERY HIGH compression performs an internal sorting within the compression chunk size (it is defind on the fly). But it seems that the table has a few distinct values on A & B columns so that you have this positive effect on compression ratios.
If you can just try the same with QUERY LOW you might see a larger gap between compressed & uncompressed loads. And guess what ?!?
kocakahin said
Sorry for using the wrong idiom “a few distinct”. Let me correct it with “a few distinct values compared to number of total rows”
Turkcell: Миграция на Exadata « Oracle mechanics said
[...] Мигрировали с Sun M9000 (процессоры Sun Sparc 7 2.52 GHz) на Oracle Exadata V2 (Intel Xeon® E5540 2.53 GHz), использование Exadata Hybrid Columnar Compression позволило снизить объём данных со 100 (10g compressed) до 25 ТБ (HCC, ссылка по теме Getting the most from hybrid columnar compression) [...]
2010 in review « Ferhat's Blog said
[...] Getting the most from hybrid columnar compression August 20102 comments 3 [...]
Dmitry said
In OLAP task. I have 30GB table without compression, 15 coluns, after sorting and compression 14 columns size is 760 Mb
fsengonul said
Today I’ve tried the same method on a pure CDR table. The results are even better:
SEGMENT_NAME ROUND(BYTES/1024/1024/1024,2)
NOCOMPRESS…………………6,45
COMPRESS_FOR_OLTP……….4,41
COMPRESS_FOR_QH………….1,33
COMP_QH_WITH_SORT_A_B…0,46