Total Pageviews

Sunday 17 November 2013

Sample BTEQ Export

exec 1> logfileName.log 2>&1
bteq <<EOF
.SET WIDTH 200;
.SET SESSION TRANSACTION BTET;
.run file LoginFile;
database DB_Name;
.SET WIDTH 500;
.SET TITLEDASHES OFF ;
.EXPORT REPORT FILE=/$PATH/FileName.txt
SQL Query;
.EXPORT RESET;

.QUIT 0
EOF

Tuesday 29 October 2013

Sample Bteq Script

exec 1>/LogPath/logs/`echo $0 | cut -d '/' -f8 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
bteq <<EOF
.SET WIDTH 150;
.SET SESSION TRANSACTION BTET;
.run file LOGON_ID;
SELECT SESSION;
SET QUERY_BAND ='ApplicationName=$0;Frequency=Daily;' FOR SESSION;
DATABASE DB_NAME;
.IF ERRORCODE <> 0 THEN .GOTO ERRORS
SQL Query;
.IF ERRORCODE <> 0 THEN .GOTO ERRORS
.QUIT 0
.LABEL ERRORS
.QUIT ERRORCODE
EOF
RETURN_CODE=$?
echo " script return code= " $RETURN_CODE
exit $RETURN_CODE

Wednesday 24 July 2013

Basic BTEQ commands

BTEQ Commands (Below are the BTEQ commands and their purpose)
The BTEQ commands in Teradata are designed for flexibility. These commands are not used directly on the data inside the tables. However, these 60 different BTEQ commands are utilized in four areas.
·     Session Control Commands
·     File Control Commands
·     Sequence Control Commands
·     Format Control Commands

 Session Control Commands

ABORT: Abort any and all active running requests and transactions for a session, but do not exit BTEQ.

DEFAULTS: Reset all BTEQ Format command options to their defaults. This will utilize the default configurations.

EXIT: Immediately end the current session or sessions and exit BTEQ.

HALT EXECUTION: Abort any and all active running requests and transactions and EXIT BTEQ.

LOGOFF: End the current session or sessions, but do not exit BTEQ.

LOGON: Starts a BTEQ Session. Every user, application, or utility must LOGON to Teradata to establish a session.

QUIT: End the current session or sessions and exit BTEQ.

SECURITY: Specifies the security level of messages between a network-attached system and the Teradata Database.

SESSIONS: Specifies the number of sessions to use with the next LOGON command.

SESSION CHARSET: Specifies the name of a character set for the current session or sessions.

SESSION SQLFLAG: Specifies a disposition of warnings issued in response to violations of ANSI syntax. The SQL will still run, but a warning message will be provided. The four settings are FULL, INTERMEDIATE, ENTRY, and NONE.

SESSION TRANSACTION: Specifies whether transaction boundaries are determined by Teradata SQL or ANSI SQL semantics.

SHOW CONTROLS: Displays all of the BTEQ control command options currently configured.

SHOW VERSIONS: Displays the BTEQ software release versions.

TDP: Used to specify the correct Teradata server for logons for a particular session.



File Control Commands

These BTEQ commands are used to specify the formatting parameters of incoming and outgoing information. This includes identifying sources and determining I/O streams.

CMS: Execute a VM CMS command inside the BTEQ environment.

ERROROUT: Write error messages to a specific output file.

EXPORT: Open a file with a specific format to transfer information directly from the Teradata database.

HALT EXECUTION: Abort any and all active running requests and transactions and EXIT BTEQ.

FORMAT: Enable/inhibit the page-oriented format command options.

IMPORT: Open a file with a specific format to import information into Teradata.

INDICDATA: One of multiple data mode options for data selected from Teradata. The modes are INDICDATA, FIELD, or RECORD MODE.

OS: Execute an MS-DOS, PC-DOS, or UNIX command from inside BTEQ.

QUIET: Limit BTEQ output displays to all error messages and request processing statistics.

RECORDMODE: One of multiple data mode options for data selected from Teradata. (INDICDATA, FIELD, or RECORD).

REPEAT: Submit the next request a certain amount of times.

RUN: Execute Teradata SQL requests and BTEQ commands directly from a specified run file.

TSO: Execute an MVS TSO command from inside the BTEQ environment.




Sequence Control Commands
These commands control the sequence in which Teradata commands operate.
ABORT: Abort any active transactions and requests.
ERRORLEVEL: Assign severity levels to particular error numbers.
EXIT: End the current session or sessions and exit BTEQ.
GOTO: Skip all intervening commands and resume after branching forward to the specified label.
HANG: Pause BTEQ processing for a specific amount of time.
IF…THEN: Test a stated condition, and then resume processing based on the test results.
LABEL: The GOTO command will always GO directly TO a particular line of code based on a label.
MAXERROR: Specifies a maximum allowable error severity level.
QUIT: End the current session or sessions and exit BTEQ.
REMARK: Place a comment on the standard output stream.
REPEAT: Submit the next request a certain amount of times.
 

Tuesday 26 March 2013

Partitioned Primary Index (PPI)

Partitioned Primary Index (PPI)

Partitioned Primary Index is one of the unique features of Teradata, which allows access of portion of data of large table. This reduces the overhead of scanning the complete table thus improving performance. 

PPI works by hashing rows to different virtual AMPs, as is done with a normal PI.  PPI does not alter data distribution, it only creates partitions on data already distributed based on PI.

Usually PPI's are defined on a table in order to increase query efficiency by avoiding full table scans without the overhead and maintenance costs of secondary indexes.

Partitions are usually defined based on Range or Case as follows.
Two functions, RANGE_N and CASE_N, can be used to simplify the specification of a partitioning expression.

1. Partition by CASE
CREATE      TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case_n (
            order_total < 10000 ,
            order_total < 20000 ,
            order_total < 30000,
NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range - example using date range
CREATE TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range_n (
            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'
            EACH interval '1' month,
NO RANGE
        OR  UNKNOWN);

P.S: If we use NO RANGE or NO CASE - then all values not in this range will be in a single partition.
If we specify UNKNOWN, then all null values will be placed in this partition


PPI improves performance as follows:
  • Automatic optimization occurs for queries that specify a restrictive condition on the partitioning column.
  • Uses partition elimination to improve the efficiency of range searches when, for example, the searches are range partitioned.
  • Only the rows of the qualified partitions in a query need to be accessed avoid full table scans.
  • Provides an access path to the rows in the base table while still providing efficient join Strategies
  • If the same partition is consistently targeted, the part of the table updated may be able to  fit largely in cache, significantly boosting performance
  • PPI based tables have advantage during Purging stages. Since purging based on partition is very fast and deletion happens quickly.

PPI also comes with some disadvantages like
  • The PI access disadvantage occurs only when the partitioning column is not part of the PI. In this situation, a query specifying a PI value, but no value for the partitioning column, must look in each partition for that value, instead of positioning directly to the first row for the PI value.
  • Another disadvantage is that when another table (without PPI) is joined with PPI table on PI=PI condition. If one of the tables is partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for each partition of the PPI table. This type of join is sliding window join

Limitations of Partitioned Primary Index (PPI) :
  •  Primary index of PPI table has to be 
    • Non unique PI, if PPI column is not part of Index, since enforcing a Unique PI would require checking for a duplicate key value in each partition, which would be very expensive.
  •  Primary Index of PPI table can be Unique, if PPI is part of UPI. This will result in checking for unique constraint in same partition.
  •  PPI cannot be defined on Global temporary tables and Volatile tables and also on compressed join indices
  • PPI Table rows occupy two extra bytes compared to NPPI table row, as these extra bytes store the partition number for each row .
  • PPI table rows are four bytes wider if value compression is specified for the table. 

Since PPI , results in lot of partitions , there is a little overhead to user/dba.  He has to regularly run collect stats on the PPI column.

It is beneficial to collect stats on Partition column .Collecting stats on the system derived column Partition is faster because rather than reading all the base table rows for collecting information, it usually just scans the cylinder index for that PPI table.By doing so, it avoids all unnecessary partitions , thus speeding up the access.
Help stats tablename column PARTITION; -- used to list partitions in table and their details
Collect stats on tablename column PARTITION; -- refresh partition details

Tuesday 19 March 2013

Journal’s in Teradata

Journal’s in Teradata

1. Transient Journal - This maintains current transaction history. Once the query is successful it deletes entries from its table. If the current query transaction fails, It rolls back data from this table.
2. Permanent Journal - This is defined when a table is created. It can store BEFORE or AFTER image of tables. Both BEFORE and AFTER copy can also be maintained as DUAL Journal.
Permanent Journal maintains complete history of table. Permanent journal is maintained for critical table and a database can have one permanent journal, Permanent journal is used basically for Point-in-time recovery in case of accidental data loss, and this resembles redo mechanism in oracle.
3. Down AMP recovery Journal (DARJ) - Down AMP recovery Journal activates automatically when a AMP goes down. It is used with fallback protected table to maintain any change made on table during AMP is unavailable. When failed AMP is brought online the restart process applies all the changes to recovered AMP using DARJ.

Monday 4 March 2013

Getting Previous Row Data in Teradata using OLAP Teradata Function

Getting Previous Row Data in Teradata using OLAP Teradata Function
"ROWS UNBOUNDED PRECEDING" is used in an ordered analytical function to tell it to include all of the preceding rows in the partition in the calculation being performed.
 “ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING” is used in an ordered analytical function to tell it to include only preceding row in the partition in the calculation being performed.
“ROWS BETWEEN 1 FOLLOWINGAND 1 FOLLOWING” is used in an ordered analytical function to tell it to include only following row in the partition in the calculation being performed.
Then Following example illustrates OLAP Functions Usage:
CREATE MULTISET TABLE TEMP1 (COL1 INT, COL2 INT);

INSERT INTO TEMP1 VALUES(100 , 2000);
INSERT INTO TEMP1 VALUES(100 , 3000);
INSERT INTO TEMP1 VALUES(100 , 4000);
INSERT INTO TEMP1 VALUES(300 , 5000);
INSERT INTO TEMP1 VALUES(300 , 6000);
INSERT INTO TEMP1 VALUES(300 , 7000);

SELECT  COL1
,COL2
,COALESCE(MIN(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), 0) A1
,COALESCE(MAX(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING), 0) A2
,COALESCE(MIN(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) A3
,COALESCE(MAX(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) A4
,COALESCE(SUM(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) A5
,COALESCE(AVG(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) A6
,COALESCE(COUNT(col2) OVER (PARTITION BY col1  ORDER BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) A7
FROM TEMP1;

Monday 25 February 2013

Which is faster? Select * from table or select 'all Columns' from table??

Many of us would have come across a scenario where listing column names in select SQL was found to be faster than using select * from table command. This indeed is interesting.

The reason being,

In case of using "select * from table", an extra stage is added where * is replaced by column names by teradata and then it would fetch the data.

But using "select <all Columns > from table” eliminates this extra stage of verifying and fetching on columns from the table.

Hence it is always recommended to use "select <all Columns > from table”

Teradata HELP commands

Teradata provides one of the user friendly featured like HELP commands.

Let us see the various help commands and their uses.

HELP SESSION;   
This command is used to display the user name, account name, logon date and time, current database name, collation code set and character set being used and also , transaction semantics, time zone and character set data.

HELP DATABASE <database-name> ;   
This command is used to display the names of all the tables (T), views (V), macros (M), and triggers (G) stored in a database and table comments

HELP USER <user-name> ;   
This command is used to display the names of all the tables (T), views (V), macros (M), and triggers (G) stored in a user area and table comments

HELP TABLE <table-name> ;   
This command is used to display the column names, type identifier, and any user written comments on the columns within a table.

HELP VOLATILE TABLE ;   
This command is used to display the names of all Volatile temporary tables active for the current  user session.

HELP VIEW <view-name> ;   
This command is used to display the column names, type identifier, and comments on the columns within a view.

HELP MACRO <macro-name> ;   
This command is used to display the characteristics of parameters passed to it at execution time.

HELP TRIGGER <trigger-name> ;   
This command is used to display details created for a trigger, like action time and sequence.

HELP PROCEDURE <procedure-name> ;   
This command is used to display the characteristics of parameters passed to it at execution time.

HELP COLUMN <table-name>.*  OR  HELP COLUMN <view-name>.*  OR HELP COLUMN <table-name>.<column-name>, .…;   
This command is used to display detail data describing the column level characteristics.

HELP INDEX <table-name> ;   
This command is used to display the indexes and their characteristics like unique or non-unique and the column or columns involved in the index. This information  is used by the Optimizer to create a plan for SQL.

HELP STATISTICS <table-name> ;   
This command is used to display values associated with the data demographics collected on the table. This information is used by the Optimizer to create a plan for SQL.

HELP CONSTRAINT <table-name>.<constraint-name> ;   
This command is used to display the checks to be made on the data when it is inserted or updated and the columns are involved.

HELP 'SQL';   
This command is used to display a list of all  available SQL commands and functions.

HELP 'SQL  <command>';   
This command is used to display the basic syntax and options for the SQL command used in place of the <command>.

HELP 'SPL';   
This command is used to display a list of available SPL commands.

HELP 'SPL <command>';   
This command is used to display the basic syntax and options for the SPL command used in place of the <command>.

Distinct Vs group by in Teradata

Distinct Vs group by in Teradata

There is always a debate going on when finding out unique values in a given table.  The problem comes into picture when we use Group by or distinct to find it.
Both return same number of rows, but with some execute time difference between them. Execution time is always a very important factor considering performance as one of the major factors is teradata warehouse.

So which is more efficient - DISTINCT or GROUP BY?

Since DISTINCT redistributes the rows immediately, more data may move between the AMPs, where as  GROUP BY that only sends unique values between the AMPs.
So, we can say that  GROUP BY sounds more efficient. 
    But when you assume that data is nearly unique in a table, GROUP BY will  spend more time attempting to eliminate duplicates that do not exist at all.Therefore, it is wasting its  time to check for duplicates the first time. Then, it must redistribute the same amount of data .

Let us see how these steps are used in each case for elimination of Duplicates
(can be found out using explain plan)

DISTINCT
1. It reads each row on AMP
2. Hashes the column value identified in the distinct clause of select statement.
3. Then redistributes the rows according to row value into appropriate AMP
4. Once  redistribution is completed , it
    a. Sorts data to group duplicates on each AMP
    b. Will remove all the duplicates on each amp and sends the original/unique value

P.s: There are cases when "Error : 2646 No more Spool Space " . In such cases try using GROUP BY.

GROUP BY
1. It reads all the rows part of GROUP BY
2. It will remove all duplicates in each AMP for given set of values using "BUCKETS" concept
3. Hashes the unique values on each AMP
4. Then it will re-distribute them to particular /appropriate AMP's
5. Once  redistribution is completed , it
    a. Sorts data to group duplicates on each AMP
    b. Will remove all the duplicates on each amp and sends the original/unique value


hence it is better to go for
GROUP BY         -           when Many duplicates
DISTINCT                     when few or no duplicates
GROUP BY                   SPOOL space is exceeded
 

Difference between count(*),count(column) and count(1) ?

Difference between count(*),count(column) and count(1) ?
Count is one of very important functions used in any database. But what many don’t know that is the result count we get from COUNT function might be different based on how it is used. Let us consider the example of count (*) and Count (1) and understand how there is some difference the usage.
CREATE MULTISET TABLE TEMP ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      X CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      Y CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC)
NO PRIMARY INDEX ;


INSERT INTO TEMP VALUES(,);
INSERT INTO TEMP VALUES(,'3');
INSERT INTO TEMP VALUES('2',);
INSERT INTO TEMP VALUES('1','1');
SELECT * FROM TEMP;

X
Y
2
?
?
3
?
?
1
1


SELECT COUNT(*), COUNT(x), COUNT(1) FROM TEMP;

Result Set:

Count(*)
Count(X)
Count(1)
4
2
4