Internals//GIN Indexes: Limitations

GIN assumes that indexable operators are strict. This means that extractValue will not be called at all on a null item value (instead, a placeholder index entry is created automatically), and extractQuery will not be called on a null query value either (instead, the query is presumed to be unsatisfiable). Note however that null key values contained within a non-null composite item or query value are supported. Prev Next GIN Tips and Tricks Up Examples

Internals//GIN Indexes: Implementation

Internally, a GIN index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items (a member of an array, for example) and where each tuple in a leaf page contains either a pointer to a B-tree of heap pointers (a "posting tree"), or a simple list of heap pointers (a "posting list") when the list is small enough to fit into a single index tuple along with the key value. As of PostgreSQL 9.1, null key values can be included in the index. Also, placeho

Internals//GIN Indexes: GIN Tips and Tricks

Create vs. insert Insertion into a GIN index can be slow due to the likelihood of many keys being inserted for each item. So, for bulk insertions into a table it is advisable to drop the GIN index and recreate it after finishing bulk insertion. As of PostgreSQL 8.4, this advice is less necessary since delayed indexing is used (see Section 63.4.1 for details). But for very large updates it may still be best to drop and recreate the index. maintenance_work_mem Build time for a GIN index is ve

Internals//GIN Indexes: Extensibility

The GIN interface has a high level of abstraction, requiring the access method implementer only to implement the semantics of the data type being accessed. The GIN layer itself takes care of concurrency, logging and searching the tree structure. All it takes to get a GIN access method working is to implement a few user-defined methods, which define the behavior of keys in the tree and the relationships between keys, indexed items, and indexable queries. In short, GIN combines extensibility with

Internals//GIN Indexes: Examples

The PostgreSQL source distribution includes GIN operator classes for tsvector and for one-dimensional arrays of all internal types. Prefix searching in tsvector is implemented using the GIN partial match feature. The following contrib modules also contain GIN operator classes: btree_gin B-tree equivalent functionality for several data types hstore Module for storing (key, value) pairs intarray Enhanced support for int[] pg_trgm Text similarity using trigram matching Prev Next Limit

Internals//GIN Indexes: Built-in Operator Classes

The core PostgreSQL distribution includes the GIN operator classes shown in Table 63-1. (Some of the optional modules described in Appendix F provide additional GIN operator classes.) Table 63-1. Built-in GIN Operator Classes Name Indexed Data Type Indexable Operators _abstime_ops abstime[] && <@ = @> _bit_ops bit[] && <@ = @> _bool_ops boolean[] && <@ = @> _bpchar_ops character[] && <@ = @> _bytea_ops bytea[] && <@ =

Internals//GIN Indexes

GIN stands for Generalized Inverted Index. GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words. We use the word item to refer to a composite value that is to be indexed, and the word key to refer to an element value. GIN always st

Internals//Genetic Query Optimizer: Query Handling as a Complex Optimization Problem

Among all relational operators the most difficult one to process and optimize is the join. The number of possible query plans grows exponentially with the number of joins in the query. Further optimization effort is caused by the support of a variety of join methods (e.g., nested loop, hash join, merge join in PostgreSQL) to process individual joins and a diversity of indexes (e.g., B-tree, hash, GiST and GIN in PostgreSQL) as access paths for relations. The normal PostgreSQL query optimizer pe

Internals//Genetic Query Optimizer: Genetic Query Optimization (GEQO) in PostgreSQL

The GEQO module approaches the query optimization problem as though it were the well-known traveling salesman problem (TSP). Possible query plans are encoded as integer strings. Each string represents the join order from one relation of the query to the next. For example, the join tree /\ /\ 2 /\ 3 4 1 is encoded by the integer string '4-1-3-2', which means, first join relation '4' and '1', then '3', and then '2', where 1, 2, 3, 4 are relation IDs within the PostgreSQL optimizer. Speci

Internals//Genetic Query Optimizer: Genetic Algorithms

The genetic algorithm (GA) is a heuristic optimization method which operates through randomized search. The set of possible solutions for the optimization problem is considered as a population of individuals. The degree of adaptation of an individual to its environment is specified by its fitness. The coordinates of an individual in the search space are represented by chromosomes, in essence a set of character strings. A gene is a subsection of a chromosome which encodes the value of a single p