Getting the most from hybrid columnar compression
Posted by fsengonul on August 9, 2010
I wasn’t very willing to make tests on the effects of sort order to the compression ratio in hybrid columnar compression. Having consequent same values should increase the compress ratio but we could only sort the data in order of just one column. I thought that the gain could not be significant. But after the tests it’s seen that nearly half of the disk space can be saved, if you have an idea on the structure of your data.
You may find the avg_col_length and number of distinct values (NoDV) from DBA_TAB_COL_STATISTICS or DBA_PART_COL_STATISTICS . A column with high average length and reasonable distinct values is a good candidate. (There is no meaning to use NoDV 1 or the primary key for sorting)
Here are my results:
137 GB has reduced to 21,21 GB with query_high compression without sorting. If I sort on column A the size reduced to 12,18 GB and if I sort on column A and column B together the size was 11,64 GB which is 45 % better than the unsorted one.
It would be much more useful if you have the chance to sort the data before entering your database; on ETL machine for example.