Total Pageviews

Teradata Architecture

Introduction to Teradata RDBMS
Teradata RDBMS is a complete relational database management system. The system is based on
off-the-shelf Symmetric Multiprocessing (SMP) technology combined with a communication
network connecting the SMP systems to form a Massively Parallel Processing (MMP) system.
BYNET is a hardware inter-processor network to link SMP nodes. All processors in a same SMP
node are connected by a virtual BYNET. We use the following figure to explain how each
component in this DBMS works together.
PDE (Parallel Database Extensions):
This component is an interface layer on the top of operating system. Its functions
include: executing vprocs (virtualprocessors), providing a parallel environment,
scheduling sessions, debugging, etc.

Teradata File System:
It allows Teradata RDBMS to store and retrieve data regardless of low-level operating system interface.
PE (Parsing Engine):
         Communicate with client
         Manage sessions
         Parse SQL statements
         Communicate with AMPs
         Return result to the client
AMP (Access Module Processor):
         BYNET interface
         Manage database
         Interface to disk subsystem
CLI (Call Level Interface):
         A SQL query is submitted and transferred in CLI packet format
TDP (Teradata Director Program):
Route the packets to the specified Teradata RDBMS server
Teradata RDBMS has the following components that support all data communication
management:
_ Call Level Interface ( CLI )
_ WinCLI & ODBC
_ Teradata Director Program ( TDP for channel attached client )
_ Micro TDP ( TDP for network attached client )
 

Node hardware and software components
  1. CPUs are not physically associated with vprocs.  Performance is best when you use the UNIX affinity scheduler to keep a logical association between a CPU and a vproc.
  2. Memory - Vprocs share a free memory pool within a node.  A segment of memory is allocated to a vproc for use, then returned to the memory pool for use by another vproc.
  3. MCA - Slots in the MCA ( Micro Channel Adapter) are used for the following connections:
    1. Local Peripheral Board (LPB)
    2. External disk arrays
    3. LAN connections
    4. Mainframe channel connections
  4. MCCA - MCCA boards (Micro Channel Cable Adapter) enable communication between a channel-attached node and the Tailgate box.  MCCA boards are located in MCA slots.
  5. Ethernet Card - Each LAN connection to a node requires an Ethernet card, which communicates with the Teradata Gateway software.  Ethernet cards are located in MCA slots.
  6. Twisted Pair Shielded Cable - Connects the MCCA card to the Tailgate box for a mainframe channel connection.
  7. LAN Cable - Connect the Ethernet cards in the MCA to the LAN.
  8. Tailgate Box - An adapter between the node cabinet and the mainframe in a channel-connected system.
  9. Bus and Tag Cables - Connects the Tailgate box to the mainframe.
  10. Virtual Disk(vdisk) - The logical disk that is managed by an AMP.  Each AMP is associated with a single disk.
  11. UNIX - The Teradata RDBMS is built on the UNIX operating system for an open environment.  NCR added MP-RAS extensions to UNIX to facilitate a multiple CPU environment.
  12. Parallel Database Extensions (PDE) - Software that runs on UNIX MP-RAS.  It was created by NCR to support the parallel environment.
  13. Trusted Parallel Application (TPA) - Implements virtual processors and runs on the foundation of UNIX MP-RAS and PDE.
  14. The Teradata RDBMS for UNIX is classified as a TPA.
  15. Access Module Processors (AMP) are vprocs that receive steps from PEs and perform  database functions to return of update data.  Each AMP is associated with one vdisk.
  16. PE - Vprocs that create SQL requests from the client and break the requests into steps.  The PEs send the steps to the AMPs and subsequently return the answer to the client.
  17. Teradata Gateway - Software that communicates between the PEs and applications running on LAN-attached clients and a node in the system.  The Teradata Gateway has a session limit of 600 sessions.
  18. Channel Driver - Software that communicates between the PEs and applications running on channel-attached clients.
Platforms
  1. Single Node System:  All of the node components together comprise a node.  A single node system is typically implemented on an SMP platform.  The vprocs in an SMP system communicate over the vnet.
  2. Nodes working together create a multiple-node Teradata RDBMS system, which is implemented on an MPP platform.  The nodes and vprocs communicate over the BYNET (Banyan Network).
BYNET
  1. The BYNET is a high-speed interconnect that is responsible for:
    1. Sending messages
    2. Merging data
    3. Sorting answers
  2. The BYNET messaging capability enables vprocs to send different types of messages:
    1. Point-to-Point - A vproc can send a message to another vproc:
      1. In the same node using BYNET software only, the message is reassigned in memory to the target vproc.
      2. In another node the message is using both BYNET hardware and software.
    2. Multicast - A vproc can send a message to multiple vprocs by sending a broadcast message to all nodes.  The BYNET software on the receiving node determines whether a vproc on the node should receive or discard the message.
    3. Broadcast - A vproc can broadcast a message to all the vprocs in the system.
  3. Two BYNETs per system for the following reasons:
    1. Performance
    2. Fault Tolerance
Clique
  1. A clique is a group of nodes that share access to the same disk arrays.  The nodes have a daisy-chain connection to each disk array controller.
  2. Cliques provide data accessibility if a node fails for any reason (i.e. UNIX reset).
  3. Vprocs are distributed across all nodes in the system.  Each multi-node system has at least one clique.
Software Components
  1. UNIX operating system - The Teradata RDBMS runs on UNIX SVR4 with MP-RAS.
  2. Parallel Database Extensions (PDE) - PDE was added to the UNIX kernel by NCR to support the parallel software environment.
  3. Trusted Parallel Application (TPA) - A TPA uses PDE to implement virtual processors.  The Teradata RDBMS is classified as a TPA.
  4. Channel Driver - The Channel Driver software is the means of communication between the application and the PEs assigned to channel-attached clients.
  5. Teradata Gateway - The Gateway software is the means of communication between the application and the PEs assigned to network-attached clients.  There is one Gateway per node.
  6. AMP - The AMP is a type of vproc that has software to manage data.
    1. AMP Worker Task (AWT) Functions in the AMP perform a number of  operations, including:
      1. Locking Tables
      2. Executing Tables
      3. Joining Tables
      4. Executing end transaction steps
    2. The file system software accesses the data on the virtual disks.  Each AMP uses the file system software to read from and write to the virtual disks.
    3. Console Utilities - The AMP software includes utilities to perform generally sophisticated, low-level functions such as:
      1. Configure and reconfigure the system
      2. Rebuild tables
      3. Reveal details about locks and space status
  7. PE - a PE is a type of vproc that has software components to break SQL into steps, and send the steps to the AMPs.
    1. Session Control - When you log on to the teradata RDBMS through your application, the session control software on the PE establishes that session.  Session control also manages and terminates sessions on the PE.
    2. Parser/Optimizer - The parser interprets your Teradata SQL request and checks the syntax.  The parser decomposes the request into AMP steps, using the optimizer to determine the most efficient way to access the data on the virtual disks.  Then the parser sends the steps to the dispatcher.
    3. Dispatcher - The dispatcher is responsible for a number of tasks, depending on the operation it is performing:
      1. Processing Requests
      2. Processing Responses
FLOW OF SQL STATEMENT
  1. A user generates an SQL query on the channel attached client.  The query can either be from a BTEQ session at an interactive terminal, from a compatible fourth generation language, or can originate from within an application program coded in a host language.
  2. The CLI request handler packages the request and sends it to the Teradata Director Program (TDP) for routing to the server.
  3. The TDP establishes a session, then routes the request across the communications channel to the parsing engine (PE).
  4. The parser component of the PE opens the request package and parses the SPL code for processing, interprets it, checks its syntax, and optimizes the access plan.
    1. Without errors - The parser decodes the request into a series of work steps and passes them to the dispatcher.
    2. With errors - The dispatcher receives the appropriate error message and returns it to the requester.  Processing terminates.
The dispatcher sequences the steps and passes them on to the BYNET with instructions about whether the steps are for one Access Module Process (AMP), an AMP group, or for all AMPs.
  1. The BYNET (or virtual BYNET on a single node system) distruibutes the execution steps to the appropriate AMP for processing.
  2. The AMPs process the execution steps by performing operations on the database.  The AMPs make these operations by making calls to the file system.
  3. The file system performs primitive physical data block operations by locating the data blocks to be manipulated and then passing contol to the disk subsystem.
  4. The disk subsystem retrieves the requested blocks for the file system.
  5. The disk manager returns the requested blocks to the file system.
  6. The file system returns the requested data to the database manager.
  7. The database manager sends a message back to the dispatcher stating that the data is ready to be returned to the requesting user, then sorts and transmits the data to the interface engine over the BYNET.
  8. The BYNET merges the sorted response and returns it to the requestion interface engine for packaging.
  9. The dispatcher builds the response message and routes it to the requesting client system.
  10. The TDP receives and unpacks the response messages and makes them available to the CLI.
  11. CLI passes the received data back to the requesting application in blocks.
  12. The requesting application receives the response data in the form of a relational table.

3 comments:

  1. Thanks for Information Teradata is a Relational Database Management System (RDBMS) for the world’s largest commercial databases. Teradata can store data upto Teradata bytes in size. This makes the Teradata as a market leader in data warehousing applications. Teradata Online Training

    ReplyDelete
  2. amazing and very thorough. keep up the good work

    ReplyDelete
  3. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Distribution Management Software

    ReplyDelete