Thursday, October 30, 2008

SQL*Loader Part #1

SQL*Loader loads data from external files into tables of an Oracle database. It has a powerful data parsing engine that puts little limitation on the format of the data in the datafile. You can use SQL*Loader to do the following:

  1. Load data across a network. This means that you can run the SQL*Loader client on a different system from the one that is running the SQL*Loader server.
  2. Load data from multiple datafiles during the same load session.
  3. Load data into multiple tables during the same load session.
  4. Specify the character set of the data.
  5. Selectively load data (you can load records based on the records' values).
  6. Manipulate the data before loading it, using SQL functions.
  7. Generate unique sequential key values in specified columns.
  8. Use the operating system's file system to access the datafiles.
  9. Load data from disk, tape, or named pipe.
  10. Generate sophisticated error reports, which greatly aid troubleshooting.
  11. Load arbitrarily complex object-relational data.
  12. Use secondary datafiles for loading LOBs and collections.
  13. Use either conventional or direct path loading. While conventional path loading is very flexible, direct path loading provides superior loading performance.
A typical SQL*Loader session takes as input a control file, which controls the behavior of SQL*Loader, and one or more datafiles. The output of SQL*Loader is an Oracle database (where the data is loaded), a log file, a bad file, and potentially, a discard file.


SQL*Loader Parameters

SQL*Loader is invoked when you specify the sqlldr command and, optionally, parameters that establish session characteristics. In situations where you always use the same parameters for which the values seldom change, it can be more efficient to specify parameters using the following methods, rather than on the command line:

  1. Parameters can be grouped together in a parameter file. You could then specify the name of the parameter file on the command line using the PARFILE parameter.
  2. Certain parameters can also be specified within the SQL*Loader control file by using the OPTIONS clause.

Parameters specified on the command line override any parameter values specified in a parameter file or OPTIONS clause.


SQL*Loader Control File

The control file is a text file written in a language that SQL*Loader understands. The control file tells SQL*Loader where to find the data, how to parse and interpret the data, where to insert the data, and more. Although not precisely defined, a control file can be said to have three sections. The first section contains session-wide information, for example:

  1. Global options such as bindsize, rows, records to skip, and so on.
  2. INFILE clauses to specify where the input data is located.
  3. Data to be loaded.

The second section consists of one or more INTO TABLE blocks. Each of these blocks contains information about the table into which the data is to be loaded, such as the table name and the columns of the table. The third section is optional and, if present, contains input data. Some control file syntax considerations to keep in mind are:

  1. The syntax is free-format (statements can extend over multiple lines).
  2. It is case insensitive; however, strings enclosed in single or double quotation marks are taken literally, including case.
  3. In control file syntax, comments extend from the two hyphens (--) that mark the beginning of the comment to the end of the line. The optional third section of the control file is interpreted as data rather than as control file syntax; consequently, comments in this section are not supported.
  4. The keywords CONSTANT and ZONE have special meaning to SQL*Loader and are therefore reserved. To avoid potential conflicts, Oracle recommends that you do not use either CONSTANT or ZONE as a name for any tables or columns.

Data Fields

Once a logical record is formed, field setting on the logical record is done. Field setting is a process in which SQL*Loader uses control-file field specifications to determine which parts of logical record data correspond to which control-file fields. It is possible for two or more field specifications to claim the same data. Also, it is possible for a logical record to contain data that is not claimed by any control-file field specification. Most control-file field specifications claim a particular part of the logical record. This mapping takes the following forms:

  1. The byte position of the data field's beginning, end, or both, can be specified. This specification form is not the most flexible, but it provides high field-setting performance. The strings delimiting (enclosing and/or terminating) a particular data field can be specified. A delimited data field is assumed to start where the last data field ended, unless the byte position of the start of the data field is specified.
  2. The byte offset and/or the length of the data field can be specified. This way each field starts a specified number of bytes from where the last one ended and continues for a specified length.
  3. Length-value datatypes can be used. In this case, the first n number of bytes of the data field contain information about how long the rest of the data field is.


Data Conversion and Datatype Specification

During a conventional path load, data fields in the datafile are converted into columns in the database (direct path loads are conceptually similar, but the implementation is different). There are two conversion steps:

  1. SQL*Loader uses the field specifications in the control file to interpret the format of the datafile, parse the input data, and populate the bind arrays that correspond to a SQL INSERT statement using that data.
  2. The Oracle database accepts the data and executes the INSERT statement to store the data in the database.

The Oracle database uses the datatype of the column to convert the data into its final, stored form. Keep in mind the distinction between a field in a datafile and a column in the database. Remember also that the field datatypes defined in a SQL*Loader control file are not the same as the column datatypes.


Discarded and Rejected Records

Records read from the input file might not be inserted into the database. Such records are placed in either a bad file or a discard file.

The Bad File

The bad file contains records that were rejected, either by SQL*Loader or by the Oracle database. If you do not specify a bad file and there are rejected records, then SQL*Loader automatically creates one. It will have the same name as the data file, with a.bad extension. Some of the possible reasons for rejection are discussed in the next sections.

SQL*Loader Rejects

Datafile records are rejected by SQL*Loader when the input format is invalid. For example, if the second enclosure delimiter is missing, or if a delimited field exceeds its maximum length, SQL*Loader rejects the record. Rejected records are placed in the bad file.

Oracle Database Rejects

After a datafile record is accepted for processing by SQL*Loader, it is sent to the Oracle database for insertion into a table as a row. If the Oracle database determines that the row is valid, then the row is inserted into the table. If the row is determined to be invalid, then the record is rejected and SQL*Loader puts it in the bad file. The row may be invalid, for example, because a key is not unique, because a required field is null, or because the field contains invalid data for the Oracle datatype.

Please go SQL*Loader Part #2 for more details


Related Topic:
  1. SQL*Loader Part #1
  2. SQL*Loader Part #2

No comments: