Wednesday, December 24, 2008

ORACLE TEXT

Oracle Text is a technology that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text.

To design your Oracle Text application, you must determine the type of queries you expect to execute. We can divide application queries into four different categories:

  1. Text Queries on Document Collections: A text query application enables users to search document collections such as Web sites, digital libraries, or document warehouses. Searching is enabled by first indexing the document collection. The collection is typically static with no significant change in content after the initial indexing run. Documents can be of any size and of different formats such as HTML, PDF, or Microsoft Word. These documents are stored in a document table. Queries usually consist of words or phrases. Other query operations such as stemming, proximity searching, and wild carding can be used to improve the search results. The queries for this type of application are best served with a CONTEXT index on your document table. To query this index, your application uses the SQL CONTAINS operator in the WHERE clause of a SELECT statement.
  2. Queries on Catalog Information: Catalog information consists of inventory type information such as online book store or auction site. The stored information consists of text information such as book titles and related structured information such as price. The information is usually updated regularly to keep the online catalog up to date with the inventory. Queries are usually a combination of a text component and a structured component, such as price or author. Results are almost always sorted by a structured component such as date or price. Catalog applications are best served by a CTXCAT index. You query this index with the CATSEARCH operator in the WHERE clause of a SELECT statement.
  3. Document Classification: In a document classification application, an incoming stream or a set of documents is compared to a predefined set of rules. When a document matches one or more rules, the application performs some action. For example, assume we have an incoming stream of news articles. We can define a rule to represent the category of Finance. The rule is essentially one or more queries that select document about the subject of Finance. The rule might have the form 'stocks or bonds or earnings'. When a document arrives about a Wall Street earnings forecast and satisfies the rules for this category, the application takes an action such as tagging the document as Finance or emailing one or more users. To create a document classification application, you create a table of rules and then create a CTXRULE index. To classify an incoming stream of text, use the MATCHES operator in the WHERE clause of a SELECT statement.
  4. XML Searching: An XML search application performs searches over XML documents. In a regular document search, you usually search across a set of documents to return documents that satisfy a text predicate; in an XML search, you often use the structure of the XML document to restrict the search. Typically, only that part of the document that satisfies the search is returned. For example, instead of finding all purchase orders that contain the word electric , the user might need only purchase orders in which the comment field contains electric . Oracle Text enables you to perform XML searching using the following approaches: A. Using Oracle Text B. Using the Oracle XML DB Framework and C. Combining Oracle Text features with Oracle XML DB.

Related Topics:

  1. ORACLE TEXT Installation.
  2. Example of Text Queries on Document Collections.
  3. Example of Queries on Catalog Information.
  4. Example of Document Classification.

ORACLE TEXT: Document Classification

The function of a classification application is to perform some action based on document content. These actions can include assigning a category id to a document or sending the document to a user. The result is classification of a document. Documents are classified according to pre-defined rules. These rules select for a category. For instance, a query rule of 'presidential elections' might select documents for a category about politics. Oracle Text provides several types of classification. One type is simple , or rule-based classification, discussed here, in which you create both document categories and the rules for categorizing documents. With supervised classification , Oracle Text derives the rules from a set of training documents you provide. With clustering , Oracle Text does all the work for you, deriving both rules and categories.

Create the Rule Table


CREATE TABLE queries (
query_id NUMBER,
query_string VARCHAR2(80)
);

INSERT INTO queries VALUES (1, 'oracle'); INSERT INTO queries VALUES (2, 'larry or ellison');
INSERT INTO queries VALUES (3, 'oracle and text'); INSERT INTO queries VALUES (4, 'market share');

Create Your CTXRULE Index


CREATE INDEX queryx ON queries(query_string) INDEXTYPE IS CTXRULE;

Classify with MATCHES


COLUMN query_string FORMAT a35;
SELECT query_id,query_string FROM queries WHERE MATCHES (query_string, 'Oracle announced that its market share in databases increased over the last year.')>0;

Related Topics:

  1. Example of Text Queries on Document Collections.
  2. Example of Queries on Catalog Information.

ORACLE TEXT: Queries on Catalog Information

This example creates a catalog index for an auction site that sells electronic equipment such as cameras and CD players. New inventory is added everyday and item descriptions, bid dates, and prices must be stored together. The application requires good response time for mixed queries. The key is to determine what columns users frequently search so that we can create a suitable CTXCAT index. Queries on this type of index are issued with the CATSEARCH operator.

Create Your Table

Set up an auction table to store your inventory:

CREATE TABLE auction(
item_id NUMBER,
title VARCHAR2(100),
category_id NUMBER,
price NUMBER,
bid_close DATE);


Populate Your Table

Now populate the table with various items, each with an id, title, price and bid_date:

INSERT INTO AUCTION VALUES(1, 'NIKON CAMERA', 1, 400, '24-OCT-2002'); INSERT INTO AUCTION VALUES(2, 'OLYMPUS CAMERA', 1, 300, '25-OCT-2002'); INSERT INTO AUCTION VALUES(3, 'PENTAX CAMERA', 1, 200, '26-OCT-2002'); INSERT INTO AUCTION VALUES(4, 'CANON CAMERA', 1, 250, '27-OCT-2002');


Determine your Queries

You need to determine what criteria are likely to be retrieved. In this example, you determine that all queries search the title column for item descriptions, and most queries order by price. When using the CATSEARCH operator later, we'll specify the terms for the text column and the criteria for the structured clause. Step 2 Create the Sub-Index to Order by Price For Oracle Text to serve these queries efficiently, we need a sub-index for the price column, since our queries will order by price. Therefore, create an index set called auction_set and add a sub-index for the price column:

EXEC CTX_DDL.CREATE_INDEXT_SET('auction_iset'); EXEC CTX_DDL.ADD_INDEX('auction_iset','price'); /* sub-index A*/


Create the CTXCAT Index

Create the combined catalog index on the AUCTION table with CREATE INDEX as follows:

CREATE INDEX auction_titlex ON AUCTION(title) INDEXTYPE IS CTXSYS.CTXCAT PARAMETERS ('index set auction_iset');

Querying Your Table with CATSEARCH : When you have created the CTXCAT index on the AUCTION table, you can query this index with the CATSEARCH operator. First set the output format to make the output readable:

COLUMN title FORMAT a40;
SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0;
SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'price <= 300')>0;

Update Your Table

You can update your catalog table by adding new rows. When you do so, the CTXCAT index is automatically synchronized to reflect the change. For example, add the following new rows to our table and then reexecute the query:

INSERT INTO AUCTION VALUES(5, 'FUJI CAMERA', 1, 350, '28-OCT-2002');
INSERT INTO AUCTION VALUES(6, 'SONY CAMERA', 1, 310, '28-OCT-2002');

SELECT title, price FROM auction WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0;

Related Topics:

  1. Example of Text Queries on Document Collections.
  2. Example of Document Classification.

ORACLE TEXT: Text Queries on Document Collections

In a basic text query application, users enter query words or phrases and expect the application to return a list of documents that best match the query. Such an application involves creating a CONTEXT index and querying it with CONTAINS.


Create your Text Table

The following example creates a table called docs with two columns, id and text, by using the CREATE TABLE statement. This example makes the id column the primary key. The text column is VARCHAR2.

CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));

Load Documents into Table: You can use the SQL INSERT statement to load text to a table. To populate the docs table, use the INSERT statement as follows:

INSERT INTO docs VALUES(1, 'California is a state in the US.');
INSERT INTO docs VALUES(2, 'Paris is a city in France.');
INSERT INTO docs VALUES(3, 'France is in Europe.');



Create the CONTEXT index

Index the HTML files by creating a CONTEXT index on the text column as follows. Since you are indexing HTML, this example uses the NULL_FILTER preference type for no filtering and uses the HTML_SECTION_GROUP type:

CREATE INDEX idx_docs ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');

Use the NULL_FILTER because you do not need to filter HTML documents during indexing. However, if you index PDF, Microsoft Word, or other formatted documents, use the CTXSYS.INSO_FILTER (the default) as your FILTER preference. This example also uses the HTML_SECTION_GROUP section group which is recommended for indexing HTML documents. Using HTML_SECTION_GROUP enables you to search within specific HTML tags, and eliminates from the index unwanted markup such as font information.


Querying Your Table with CONTAINS

You query the table with the SELECT statement with CONTAINS to retrieve the document ids that satisfy the query. Before doing so, set the format of the SELECT statement's output so that it is easily readable. To do so, set the width of the text column to 40 characters:

COLUMN text FORMAT a40;
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'France', 1) > 0;


Present the Document

In a real application, you might want to present the selected document to the user with query terms highlighted. Oracle Text enables you to mark up documents with the CTX_DOC package. We can demonstrate HTML document markup with an anonymous PL/SQL block in SQL*Plus. However, in a real application you might present the document in a browser. This PL/SQL example uses the in-memory version of CTX_DOC.MARKUP to highlight the word France in document 3. It allocates a temporary CLOB (Character Large Object datatype) to store the markup text and reads it back to the standard output. The CLOB is then de-allocated before exiting:

SET SERVEROUTPUT ON;

DECLARE
mklob CLOB;
amt NUMBER := 40;
line VARCHAR2(80);
BEGIN

CTX_DOC.MARKUP('idx_docs','3','France', mklob);
DBMS_LOB.READ(mklob, amt, 1, line);
DBMS_OUTPUT.PUT_LINE('FIRST 40 CHARS ARE:'||line); DBMS_LOB.FREETEMPORARY(mklob);

END;
/

Synchronize the Index After Data Manipulation

When you create a CONTEXT index, you need to explicitly synchronize your index to keep it up to date with any inserts, updates, or deletes to the text table. Oracle Text enables you to do so with the CTX_DDL.SYNC_INDEX procedure. Add some rows to the docs table:

INSERT INTO docs VALUES(4, 'Los Angeles is a city in California.');
INSERT INTO docs VALUES(5, 'Mexico City is big.');

Since the index is not synchronized, these new rows are not returned with a query on city :

SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0;

Therefore, synchronize the index with 2Mb of memory, and reexecute the query:

EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M');

COLUMN text FORMAT a50;
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'city', 1) > 0;


Related Topics:

  1. Example of Queries on Catalog Information.
  2. Example of Document Classification.

Monday, December 15, 2008

installation of Oracle Text on 10g

This article lists the steps for manual installation of Oracle Text 10gR1 and 10gR2, verification and uninstalling Oracle Text installation.If you create an Oracle database using the Database Configuration Assistant (DBCA) Text is installed by default and you do not need to perform the installation steps described in the Manual installation section.

Instead installation you need to grant permission to the oracle user that use oracle text.

Create User:

CREATE USER myuser IDENTIFIED BY myuser_password.

Grant Roles:

GRANT RESOURCE, CONNECT, CTXAPP TO MYUSER;

Grant EXECUTE Privileges on CTX PL/SQL Packages:

There are ten Oracle Text packages that enable you to perform actions ranging from synchronizing an Oracle Text index to highlighting documents. For example, the CTX_DDL.SYNC_INDEX package enables you to synchronize your index.

GRANT EXECUTE ON CTX_CLS TO myuser;
GRANT EXECUTE ON CTX_DDL TO myuser;
GRANT EXECUTE ON CTX_DOC TO myuser;
GRANT EXECUTE ON CTX_OUTPUT TO myuser;
GRANT EXECUTE ON CTX_QUERY TO myuser;
GRANT EXECUTE ON CTX_REPORT TO myuser;
GRANT EXECUTE ON CTX_THES TO myuser;

Oracle Text is available for no extra Licensing in all four database editions:
  1. Oracle Database Standard Edition One.
  2. Oracle Database Standard Edition (SE).
  3. Oracle Database Enterprise Edition (EE).
  4. Oracle Database Personal Edition.

Manual installation of Text 10gR1 (10.1.0.x) and 10gR2 (10.2.0.x)

  1. You must install Oracle Data Mining (ODM), feature available in Oracle Database Enterprise Edition (EE), before you install Oracle Text in order to use the SVM classifier and the KMEANS clustering. All other functions should work without ODM installed, including RULE classifier and TEXTK clustering.
  2. Oracle Text makes use of a "Knowledge Base" for English and French, which is loaded from different files provided on the second CD, known as the "Oracle Database 10g Companion CD". The Knowledge Base is required to use any of the theme-based features in Oracle Text. Therefore, if you install from the main CD only, theme functionality will not be available. For information about how to install products from the Companion CD, refer to the "Installing Oracle Database 10g Products from the Companion CD" section.

If you created your database manually or you want to install Text later, then follow these steps.


Note: In SQL*Plus we use '?' instead of $ORACLE_HOME

1. Text dictionary, schema name CTXSYS, is created by calling following script from SQL*Plus connected as SYSDBA:

SQL> connect SYS/password as SYSDBA
SQL> spool text_install.txt
SQL> @?/ctx/admin/catctx.sql CTXSYS SYSAUX TEMP NOLOCK

Where:
CTXSYS - is the ctxsys user password
SYSAUX - is the default tablespace for ctxsys
TEMP - is the temporary tablespace for ctxsys
LOCK|NOLOCK - ctxsys user account will be locked or not

2. The next step is to install appropriate language-specific default preferences. There is script which creates language-specific default preferences for every language Oracle text supports in /ctx/admin/defaults directory, such as English(US), Danish(DK), Dutch(NL), Finnish(SF), French(F), German(D), Italian(IT), Portuguese(PT), Spanish(E), and Swedish(S). They are named in the form drdefXX.sql, where XX is the international license plate code.
To manually install US default preferences, for example, log into sqlplus as CTXSYS, and run 'drdefus.sql' as described below:

SQL> connect CTXSYS/password@tns_alias
SQL> @?/ctx/admin/defaults/drdefus.sql
SQL> spool off

If you have installed Oracle Data Mining (ODM) before Text you will see in the text_install.txt logfile ORA-955 errors for public synonyms, e.g. dm_svm_build, which can be ignored. We have a dummy package that mimics the API in CTXSYS schema, and we attempt to create public synonyms to it. Now, if ODM has been installed, these public synonym creates fail and the public synonyms point to ODM objects, which is what we want.


Text 10gR1 (10.1.0.x) and Text 10gR2 (10.2.0.x) Installation verification


1. Check to make sure that all Text objects were created in CTXSYS schema and correct version is installed
2. Check to make sure that there are not invalid objects for CTXSYS.
You should get: "no rows selected". If there are then you can compile each invalid object manually.


connect SYS/password as SYSDBA

set pages 1000
col object_name format a40
col object_type format a20
col comp_name format a30
column library_name format a8
column file_spec format a60 wrap
spool text_install_verification.log

select comp_name, status, substr(version,1,10) as version from dba_registry where comp_id = 'CONTEXT';
select * from ctxsys.ctx_version;
select substr(ctxsys.dri_version,1,10) VER_CODE from dual;

select count(*) from dba_objects where owner='CTXSYS';

-- Get a summary count
select object_type, count(*) from dba_objects where owner='CTXSYS' group by object_type;

-- Any invalid objects
select object_name, object_type, status from dba_objects where owner='CTXSYS' and status != 'VALID' order by object_name;

spool off


Steps to Deinstall Oracle Text Manually


Before deinstalling Oracle Text, it is best to first drop all Text Indexes built in schemas other than CTXSYS. Text dictionary, schema name CTXSYS, is removed by calling following script from SQL*Plus connected as SYSDBA:

SQL> connect SYS/password as SYSDBA
SQL> spool text_deinstall.log
SQL> @?/ctx/admin/catnoctx.sql
SQL> drop procedure sys.validate_context;
SQL> spool off

Review the output file text_deinstall.log for errors.
Deinstallation of Oracle Text is complete.

*** Note ***
ORA-04043 for dropping sys.validate_context can be ignored, as in base release version this procedure was owned by ctxsys and needs to be owned by sys.

Related Posts:
  1. Introduction of ORACLE TEXT
  2. Find Text index size

Sunday, December 14, 2008

Constraint Information

While performing DML operations, Some times we experienced constraint violation. Then we searching the details of that culprit constraint :)

Here is a simple script that help us a lot to handle such situation:

SELECT uc.owner , uc.constraint_name
, DECODE(uc.constraint_type,'C','CHECK','P' ,'Primary key','R','Referential','U'
,'Unique Key') constraint_type , uc.table_name
, ucc.column_name , uc.r_owner , uc.r_constraint_name
, ( SELECT 'Table : '||uc1.table_name||' Column: '|| ucc1.column_name
FROM user_constraints uc1,USER_CONS_COLUMNS ucc1
WHERE uc1.constraint_name=ucc1.constraint_name
AND uc1.constraint_name=uc.r_constraint_name
) R_description
FROM user_constraints uc,USER_CONS_COLUMNS ucc
WHERE uc.constraint_name=ucc.constraint_name
AND uc.constraint_name IN('FK625EF6937B99960');

The only thing you need to change the constraint name 'FK625EF6937B99960'