Compression in teradata Here are some rules and guidelines of Teradata's data compression via the COMPRESS phrase:
1) COMPRESS:COMPRESS is not a complex data reducing algorithm. It does not reduce repeating characters (or bits), or character (or bit) patterns within rows, columns, blocks or cylinders.
COMPRESS practically eliminates the data storage of nulls (or 256 constant values) for fixed-length, non-primary index columns.
· Column must be fixed-length, 255 characters or less, and not part of the primary index.1) COMPRESS:COMPRESS is not a complex data reducing algorithm. It does not reduce repeating characters (or bits), or character (or bit) patterns within rows, columns, blocks or cylinders.
COMPRESS practically eliminates the data storage of nulls (or 256 constant values) for fixed-length, non-primary index columns.
2) Rules for compression:
· The following types of data can be compressed:
Nulls, Zeros, Blanks, Any numeric data type,
DATE (expressed as COMPRESS (DATE 'yyyy-mm-dd')),
CHARACTER(up to 255 characters).
· The default setting is no compression.
· Additional rules have been specified in the documentation.
· Only FIXED WIDTH data types can be compressed. VARIABLE WIDTH (VARCHAR) data types cannot be compressed.
· The compress column values are case specific. The value "PRODUCT" does not get compressed, if "product" is the specified compression value.
· Compression values are stored in the table header, and in V2R5, there is a limit of 255 values, and a header size limit of 64KB.
· There are 2 forms of compression:
- Teradata RDBMS V2R4.1 (current version): The compress attribute
compresses the nulls, and 1 column value.
- Teradata RDBMS V2R5 (next version): The compress attribute compresses
the nulls, plus up to 256 column values.
· The syntax for the compression attribute is as follows:
CREATE SET TABLE tbl1,
( col1 CHAR(1) COMPRESS, /* Compresses nulls */
col2 CHAR(1) COMPRESS 'product' ) /* Compresses nulls plus the
value 'product' */
PRIMARY INDEX ...;
3) Guidelines for Compression:· Only compress columns when compression affects enough of the rows in the table to make compression worthwhile.
· Measure the potential savings with the following calculation:
CC * CW , where CC is the count of potential compressed nulls (or values)
for the column, and CW is the column width.
Sqls for calculating CC:
- Get actual null count:
select count(*) from tbl1 where col1 IS NULL;
- Get actual value list by value count:
select col1, count(*) from tbl1 group by 1 order by 1 desc;
· Measure the actual savings with the following calculation:
(TB - TA) / TA, where TB is the tablesize before the compression, and TA
is the tablesize after the compression.
· Presence bits: For every data column, there will be either 0, 1, or 2 Presence Bits:
1) For each 8 columns compressable, add 1 byte of compression presence bits per row.
2) And, for each 8 columns nullable, add 1 byte of nullable presence bits per row.
· Convert VARCHAR to Fixed:
Since a VARCHAR cannot be compressed, then it could be converted to a fixed length CHAR, and then compressed. Weigh these differences: A VARCHAR column contains a 2 byte VLI (variable length indicator) for each value, and a CHAR column contains the same number of characters per value.
· Implement the compressions on the test system first (DW TEST (t62tst)), and then migrate the changes to the production system (DW EDW (t61edw)).
Advantages of Compression
The below are the advantages of data compression in Teradata.
· Compression results in shorter rows which are generally more performant.
· Column compression provides the following capacity and performance benefits:
Reduced capacity
Reduced I/O traffic
Moderate CPU savings
· The I/O savings correlates to the percentage of data compressed out of a row.
· Table backups use the compressed format, so backups will also be faster.
No comments:
Post a Comment