Wednesday, December 24, 2008

ORACLE TEXT

Oracle Text is a technology that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text.

To design your Oracle Text application, you must determine the type of queries you expect to execute. We can divide application queries into four different categories:

  1. Text Queries on Document Collections: A text query application enables users to search document collections such as Web sites, digital libraries, or document warehouses. Searching is enabled by first indexing the document collection. The collection is typically static with no significant change in content after the initial indexing run. Documents can be of any size and of different formats such as HTML, PDF, or Microsoft Word. These documents are stored in a document table. Queries usually consist of words or phrases. Other query operations such as stemming, proximity searching, and wild carding can be used to improve the search results. The queries for this type of application are best served with a CONTEXT index on your document table. To query this index, your application uses the SQL CONTAINS operator in the WHERE clause of a SELECT statement.
  2. Queries on Catalog Information: Catalog information consists of inventory type information such as online book store or auction site. The stored information consists of text information such as book titles and related structured information such as price. The information is usually updated regularly to keep the online catalog up to date with the inventory. Queries are usually a combination of a text component and a structured component, such as price or author. Results are almost always sorted by a structured component such as date or price. Catalog applications are best served by a CTXCAT index. You query this index with the CATSEARCH operator in the WHERE clause of a SELECT statement.
  3. Document Classification: In a document classification application, an incoming stream or a set of documents is compared to a predefined set of rules. When a document matches one or more rules, the application performs some action. For example, assume we have an incoming stream of news articles. We can define a rule to represent the category of Finance. The rule is essentially one or more queries that select document about the subject of Finance. The rule might have the form 'stocks or bonds or earnings'. When a document arrives about a Wall Street earnings forecast and satisfies the rules for this category, the application takes an action such as tagging the document as Finance or emailing one or more users. To create a document classification application, you create a table of rules and then create a CTXRULE index. To classify an incoming stream of text, use the MATCHES operator in the WHERE clause of a SELECT statement.
  4. XML Searching: An XML search application performs searches over XML documents. In a regular document search, you usually search across a set of documents to return documents that satisfy a text predicate; in an XML search, you often use the structure of the XML document to restrict the search. Typically, only that part of the document that satisfies the search is returned. For example, instead of finding all purchase orders that contain the word electric , the user might need only purchase orders in which the comment field contains electric . Oracle Text enables you to perform XML searching using the following approaches: A. Using Oracle Text B. Using the Oracle XML DB Framework and C. Combining Oracle Text features with Oracle XML DB.

Related Topics:

  1. ORACLE TEXT Installation.
  2. Example of Text Queries on Document Collections.
  3. Example of Queries on Catalog Information.
  4. Example of Document Classification.

No comments: