Internals: Executor

The executor takes the plan created by the planner/optimizer and recursively processes it to extract the required set of rows. This is essentially a demand-pull pipeline mechanism. Each time a plan node is called, it must deliver one more row, or report that it is done delivering rows. To provide a concrete example, assume that the top node is a MergeJoin node. Before any merge can be done two rows have to be fetched (one from each subplan). So the executor recursively calls itself to process t

Internals//Physical Storage: Visibility Map

Each heap relation has a Visibility Map (VM) to keep track of which pages contain only tuples that are known to be visible to all active transactions; it also keeps track of which pages contain only unfrozen tuples. It's stored alongside the main relation data in a separate relation fork, named after the filenode number of the relation, plus a _vm suffix. For example, if the filenode of a relation is 12345, the VM is stored in a file called 12345_vm, in the same directory as the main relation f

Internals//SP-GiST Indexes: Built-in Operator Classes

The core PostgreSQL distribution includes the SP-GiST operator classes shown in Table 62-1. Table 62-1. Built-in SP-GiST Operator Classes Name Indexed Data Type Indexable Operators kd_point_ops point << <@ <^ >> >^ ~= quad_point_ops point << <@ <^ >> >^ ~= range_ops any range type && &< &> -|- << <@ = >> @> box_ops box << &< && &> >> ~= @> <@ &<| <<| |&

Internals//Physical Storage: Initialization Fork

Each unlogged table, and each index on an unlogged table, has an initialization fork. The initialization fork is an empty table or index of the appropriate type. When an unlogged table must be reset to empty due to a crash, the initialization fork is copied over the main fork, and any other forks are erased (they will be recreated automatically as needed). Prev Next Visibility Map Up Database Page Layout

Internals//Physical Storage: Free Space Map

Each heap and index relation, except for hash indexes, has a Free Space Map (FSM) to keep track of available space in the relation. It's stored alongside the main relation data in a separate relation fork, named after the filenode number of the relation, plus a _fsm suffix. For example, if the filenode of a relation is 12345, the FSM is stored in a file called 12345_fsm, in the same directory as the main relation file. The Free Space Map is organized as a tree of FSM pages. The bottom level FSM

Internals//Physical Storage: TOAST

This section provides an overview of TOAST (The Oversized-Attribute Storage Technique). PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately know

Internals//SP-GiST Indexes

SP-GiST is an abbreviation for space-partitioned GiST. SP-GiST supports partitioned search trees, which facilitate development of a wide range of different non-balanced data structures, such as quad-trees, k-d trees, and radix trees (tries). The common feature of these structures is that they repeatedly divide the search space into partitions that need not be of equal size. Searches that are well matched to the partitioning rule can be very fast. These popular data structures were originally de

Internals//Index Access Method: Index Scanning

In an index scan, the index access method is responsible for regurgitating the TIDs of all the tuples it has been told about that match the scan keys. The access method is not involved in actually fetching those tuples from the index's parent table, nor in determining whether they pass the scan's time qualification test or other conditions. A scan key is the internal representation of a WHERE clause of the form index_key operator constant, where the index key is one of the columns of the index

Internals//Index Access Method: Index Locking Considerations

Index access methods must handle concurrent updates of the index by multiple processes. The core PostgreSQL system obtains AccessShareLock on the index during an index scan, and RowExclusiveLock when updating the index (including plain VACUUM). Since these lock types do not conflict, the access method is responsible for handling any fine-grained locking it might need. An exclusive lock on the index as a whole will be taken only during index creation, destruction, or REINDEX. Building an index t

Internals//Index Access Method: Index Uniqueness Checks

PostgreSQL enforces SQL uniqueness constraints using unique indexes, which are indexes that disallow multiple entries with identical keys. An access method that supports this feature sets amcanunique true. (At present, only b-tree supports it.) Because of MVCC, it is always necessary to allow duplicate entries to exist physically in an index: the entries might refer to successive versions of a single logical row. The behavior we actually want to enforce is that no MVCC snapshot could include tw