Text search parsers are responsible for splitting raw document text into tokens and identifying each token's type, where the set of possible types is defined by the parser itself. Note that a parser does not modify the text at all — it simply identifies plausible word boundaries. Because of this limited scope, there is less need for application-specific custom parsers than there is for custom dictionaries. At present PostgreSQL provides just one built-in parser, which has been found to be usefu

The current limitations of PostgreSQL's text search features are: The length of each lexeme must be less than 2K bytes The length of a tsvector (lexemes + positions) must be less than 1 megabyte The number of lexemes must be less than 264 Position values in tsvector must be greater than 0 and no more than 16,383 The match distance in a <N> (FOLLOWED BY) tsquery operator cannot be more than 16,384 No more than 256 positions per lexeme The number of nodes (lexemes + operators)

Full Text Search: GIN and GiST Index Types

There are two kinds of indexes that can be used to speed up full text searches. Note that indexes are not mandatory for full text searching, but in cases where a column is searched on a regular basis, an index is usually desirable. CREATE INDEX name ON table USING GIN (column); Creates a GIN (Generalized Inverted Index)-based index. The column must be of tsvector type. CREATE INDEX name ON table USING GIST (column); Creates a GiST (Generalized Search Tree)-based index. The column can be of

Full Text Search: Dictionaries

Dictionaries are used to eliminate words that should not be considered in a search (stop words), and to normalize words so that different derived forms of the same word will match. A successfully normalized word is called a lexeme. Aside from improving search quality, normalization and removal of stop words reduce the size of the tsvector representation of a document, thereby improving performance. Normalization does not always have linguistic meaning and usually depends on application semantic

Full Text Search: Controlling Text Search

To implement full text searching there must be a function to create a tsvector from a document and a tsquery from a user query. Also, we need to return results in a useful order, so we need a function that compares documents with respect to their relevance to the query. It's also important to be able to display the results nicely. PostgreSQL provides support for all of these functions. 12.3.1. Parsing Documents PostgreSQL provides the function to_tsvector for converting a document to the tsvec

Full Text Search: Configuration Example

A text search configuration specifies all options necessary to transform a document into a tsvector: the parser to use to break text into tokens, and the dictionaries to use to transform each token into a lexeme. Every call of to_tsvector or to_tsquery needs a text search configuration to perform its processing. The configuration parameter default_text_search_config specifies the name of the default configuration, which is the one used by text search functions if an explicit configuration param

Full Text Search: Additional Features

This section describes additional functions and operators that are useful in connection with text search. 12.4.1. Manipulating Documents Section 12.3.1 showed how raw textual documents can be converted into tsvector values. PostgreSQL also provides functions and operators that can be used to manipulate documents that are already in tsvector form. tsvector || tsvector The tsvector concatenation operator returns a vector which combines the lexemes and positional information of the two vectors

Full Text Search

Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query. The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query. Notions of query and similarity are very flexible and depend on the specific application. The simplest search considers query as a set of words and similarity as the frequen

Formatting Functions

The PostgreSQL formatting functions provide a powerful set of tools for converting various data types (date/time, integer, floating point, numeric) to formatted strings and for converting from formatted strings to specific data types. Table 9-23 lists them. These functions all follow a common calling convention: the first argument is the value to be formatted and the second argument is a template that defines the output or input format. Table 9-23. Formatting Functions Function Return Type De

Foreign Data

PostgreSQL implements portions of the SQL/MED specification, allowing you to access data that resides outside PostgreSQL using regular SQL queries. Such data is referred to as foreign data. (Note that this usage is not to be confused with foreign keys, which are a type of constraint within the database.) Foreign data is accessed with help from a foreign data wrapper. A foreign data wrapper is a library that can communicate with an external data source, hiding the details of connecting to the da