RUM in PostgreSQL is a specialized inverted index access method for search workloads. It is based on GIN code, but unlike plain GIN it stores additional information in the posting tree, especially data needed for ranking, phrase/proximity matching, and ordered retrieval. That is the core idea behind RUM.
What RUM is
A normal GIN index is already very good at answering questions like “which rows contain these lexemes?” for tsvector/tsquery full-text search. The weakness is that GIN does not keep enough positional and attached information inside the index to make ranking, phrase checks, and certain ordered searches as efficient as they could be. RUM addresses that by storing extra per-match information directly in the index structure.
So, conceptually:
Main benefits of RUM
According to the RUM documentation, compared to GIN, RUM has three headline advantages:
That is why RUM is mainly chosen for search systems, not for ordinary OLTP equality lookups.
What RUM is not
RUM is not a standard core access method in vanilla PostgreSQL installations. It is provided through the rum extension. In Postgres Pro documentation, CREATE INDEX lists RUM among available index methods in that distribution, and the dedicated rum module docs describe it as an extension module. Availability therefore depends on your server build and installed packages.
So this usually starts with:
CREATE EXTENSION rum;
If that fails, the extension is not installed on the server.
Internal idea
RUM is an inverted index, like GIN. In an inverted index, the system maps a token or key element to the rows that contain it. RUM extends this by keeping extra information with those postings. In full-text search, that extra information commonly includes positions of lexemes, and optionally attached values used for ordering. That design is what allows RUM to avoid some post-index heap work that GIN often still needs.
This is the key reason phrase search is faster: for phrase or proximity search, you do not only need to know that both words exist; you need to know where they occur relative to each other. RUM keeps the information needed for that much closer to the index scan itself.
Typical use cases
RUM is best when your workload looks like one of these:
Examples:
Tradeoffs
RUM is not “GIN but always better.” The docs are clear that its strengths come with tradeoffs. Because it stores more information, RUM is generally heavier than GIN for maintenance. In practice that means slower index build and slower inserts/updates than plain GIN, while searches that need ranking/phrases/order can be significantly better.
So the usual tradeoff is:
Operator classes
The RUM documentation has a dedicated operator-classes section, and in practice the most important ones are centered on full-text search, especially tsvector support. The most commonly discussed opclasses are:
rum_tsvector_ops
This is the standard RUM operator class for full-text indexing of tsvector. It supports matching tsvector against tsquery, and it is the basic choice when you want faster phrase/ranking behavior than GIN can provide.
Example:
CREATE INDEX ix_docs_rum
ON docs
USING rum (fts rum_tsvector_ops);
rum_tsvector_addon_ops
This opclass is used when you want to attach an additional sortable value to the tsvector entries, such as a timestamp. That is what enables efficient “search and order by recency” behavior.
Pattern:
CREATE INDEX ix_docs_rum
ON docs
USING rum (fts rum_tsvector_addon_ops, created_at)
WITH (attach = 'created_at', to = 'fts');
This is one of the signature RUM features.
Query behavior
With RUM, full-text search still uses the normal PostgreSQL text-search model:
Example:
SELECT id, title
FROM docs
WHERE fts @@ to_tsquery('english', 'postgresql & indexing');
Where RUM becomes especially valuable is when the query is not only asking for matches, but also for good order.
For example, the Postgres Pro explanation describes RUM as being able to return results in the needed order, similarly to how GiST can support nearest-neighbor retrieval.
A ranking-style query can look like:
SELECT id,
title,
fts <=> to_tsquery('english', 'postgresql & indexing') AS dist
FROM docs
WHERE fts @@ to_tsquery('english', 'postgresql & indexing')
ORDER BY fts <=> to_tsquery('english', 'postgresql & indexing')
LIMIT 20;
That <=> behavior is one of the practical reasons people use RUM.
Phrase and proximity search
Phrase search is where RUM is much more naturally suited than GIN, because phrase search depends on lexeme positions. The docs explicitly call out faster phrase search as a core benefit.
Examples:
SELECT id, body
FROM docs
WHERE fts @@ phraseto_tsquery('english', 'secured loan');
or
SELECT id, body
FROM docs
WHERE fts @@ to_tsquery('english', 'secured <-> loan');
These kinds of queries benefit from RUM because the index already carries the positional information needed to verify the phrase relationship efficiently.
Ordering by attached values
One of the strongest real-world RUM scenarios is this:
RUM can attach an additional value, such as a timestamp, to the indexed lexeme information. This can make ordering by that attached value much faster than a standard GIN approach that finds matches and then performs more heap work to sort.
That makes RUM attractive for systems like:
Multicolumn and advanced behavior
RUM supports more advanced indexing patterns, including multicolumn cases. Recent Postgres Pro release notes mention fixes related to scanning multi-column RUM indexes and order_by_attach, which confirms these capabilities are actively used and maintained.
That same release family also added low-level inspection functions for RUM pages, showing the access method continues to evolve in current Postgres Pro releases.