Teradata SQL Query Optimization Or Performance Tuning
SQL and Indexes :
Reducing Table Update Time :
1) Primary indexes: Use primary indexes for joins whenever possible, and specify in the where clause all the columns for the primary indexes.
2) Secondary indexes (10% rule rumor): The optimizer does not actually use a 10% rule to determine if a secondary index will be used. But, this is a good estimation: If less than 10% of a table will be accessed if the secondary index is used, then assume the sql will use the secondary index. Otherwise, the sql execution will do a full table scan.
The optimizer actually uses a “least cost” method: The optimizer determines if the cost of using a secondary index is cheaper than the cost of doing a full table scan. The cost involves the cpu usage, and diskio counts.
The optimizer actually uses a “least cost” method: The optimizer determines if the cost of using a secondary index is cheaper than the cost of doing a full table scan. The cost involves the cpu usage, and diskio counts.
3) Constants: Use constants to specify index column contents whenever possible, instead of specifying the constant once, and joining the tables. This may provide a small savings on performance.
4) Mathematical operations: Mathematical operations are faster than string operations (i.e. concatenation), if both can achieve the same result.
5) Variable length columns: The use of variable length columns should be minimized, and should be by exception. Fixed length columns should always be used to define tables.
6) Union: The “union” command can be used to break up a large sql process or statement into several smaller sql processes or statements, which would run in parallel. But these could then cause spoolspace limit problems. “Union all” executes the sql’s single threaded.
7) Where in/where not in (subquery): The sql “where in” is more efficient than the sql “where not in”. It is more efficient to specify constants in these, but if a subquery is specified, then the subquery has a direct impact on the sql time.
If there is a sql time problem with the subquery, then the sql subquery could be separated from the original query. This would require 2 sql statements, and an intermediate table. The 2 sql statements would be: 1) New sql statement, which does the previous subquery function, and inserts into the temporary table, and 2) Modified original sql statement, which doesn’t have the subquery, and reads the temporary table.
If there is a sql time problem with the subquery, then the sql subquery could be separated from the original query. This would require 2 sql statements, and an intermediate table. The 2 sql statements would be: 1) New sql statement, which does the previous subquery function, and inserts into the temporary table, and 2) Modified original sql statement, which doesn’t have the subquery, and reads the temporary table.
8) Strategic Semicolon: At the end of every sql statement, there is a semicolon. In some cases, the strategic placement of this semicolon can improve the sql time of a group of sql statements. But this will not improve an individual sql statement’s time. These are a couple cases: 1) The group’s sql time could be improved if a group of sql statements share the same tables (or spool files), 2) The group’s sql time could be improved if several sql statements use the same unix input file.
Reducing Large SQL’s :
The following methods can be used to scope down the size of sql’s.
1) Table denormalization: Duplicating data in another table. This provides faster access to the duplicated data, but requires more update time.
2) Table summarization: The data from one/many table(s) is summarized into commonly used summary tables. This provides faster access to the summarized data, but requires more update time.
3) SQL union: The DBC/SQL Union can be used to break up a large SQL process or statement into several smaller SQL processes or statements, which would run in parallel.
4) Unix split: A large input unix files could be split into several smaller unix files, which could then be input in series, or in parallel, to create smaller SQL processing steps.
5) Unix concatenation: A large query could be broken up into smaller independent queries, whose output is written to several smaller unix files. Then these smaller files are unix concatenated together to provide a single unix file.
6) Trigger tables: A group of tables, each contains a subset of the keys of the index of an original table. the tables could be created based on some value in the index of the original table. This provides an ability to break up a large SQL statement into multiple smaller SQL statements, but creating the trigger tables requires more update time.
7) Sorts (order by): Although sorts take time, these are always done at the end of the query, and the sort time is directly dependent on the size of the solution. Unnecessary sorts could be eliminated.
8) Export/Load: Table data could be exported (Bteq, Fastexport) to a unix file, and updated, and then reloaded into the table (Bteq, fastload, Multiload).
9) C PROGRAM/UNIX SCRIPTS: Some data manipulation is very difficult and time consuming in sql. These could be replaced with c programs/unix scripts. See the “C/Embedded sql” tip.
1) Table update time can be improved by dropping the table’s indexes first, and then doing the updates. After the completion of the updates, then rebuild the indexes, and recollect the table’s statistics on the indexes. The best improvement is obtained when the volume of table updates is large in relation to the size of the table. If more then 5% of a large table is changed.
2) Try to avoid dropping a table, instead, delete the table. Table related statements (i.e. create table, drop table) are single threaded thru a system permissions table and become a bottleneck. They can also cause deadlocks on the dictionary tables. Also, any user permissions specific to the table are dropped when the table is dropped, and these permissions must be recreated.
No comments:
Post a Comment