PostgreSQL Text Search Index Types (Short Overview)
PostgreSQL full-text search is implemented using three index types:
- GIN (Generalized Inverted Index)
- GiST (Generalized Search Tree)
- RUM (Real-time Universal Metric, extension)
These are the only index types that natively support tsvector / tsquery full-text search.
1. GIN (Generalized Inverted Index)
Idea
Store a direct mapping from each word (lexeme) to all rows that contain it.
Main Logic
- Convert text → tsvector (set of lexemes)
- For each lexeme:
- store list of row IDs (posting list)
- Query:
- break into lexemes
- fetch row sets
- combine (AND / OR)
Result: exact matches, no recheck needed
2. GiST (Generalized Search Tree)
Idea
Store a compressed signature (fingerprint) of document content instead of exact words.
Main Logic
- Convert text → signature (bit representation of lexemes)
- Store signatures in tree structure
- Query:
- compare query signature with stored signatures
- return candidate rows
- recheck actual data
Result: approximate match → requires verification
3. RUM (Extension Index)
Idea
Extend inverted index by storing extra information (positions, ranking data) together with lexemes.
Main Logic
- Similar to GIN:
- Additionally stores:
- positions of words
- frequency / ranking data
- Query:
- retrieve rows
- compute ranking directly from index
- return ordered results
Result: exact match + efficient ranking
Final Conclusion
- GIN → exact inverted index (standard choice)
- GiST → signature-based filtering (approximate)
- RUM → inverted index with ranking support
Core difference:
- GIN = precise lookup
- GiST = probabilistic filter
- RUM = lookup + ranking engine