GIN (Generalized Inverted Index) — Full Description

Idea
GIN is based on the concept of an inverted index.
Instead of storing rows → values, it stores:
value (lexeme) → rows that contain it
This makes it extremely efficient for searching text, where the goal is to find which rows contain specific words.
Main Logic
1. Text Transformation
- Input text is converted into a tsvector
- This process:
- tokenizes text into words
- normalizes them (stemming, lowercasing)
- removes stop words
2. Index Structure
For each lexeme, GIN stores:
- Posting List → list of row identifiers (TIDs)
- If large → converted into a Posting Tree
Structure conceptually:
3. Search Execution
- Query is converted into
tsquery (lexemes + operators)
- For each lexeme:
- retrieve matching row sets
- Combine results:
- AND → intersection
- OR → union
Result: exact matching rows
4. No Recheck Needed
Unlike GiST:
- GIN returns only valid matches
- No additional verification step required
Key Characteristics
✔ Strengths
- Very fast for full-text search
- Scales well with large datasets
- Precise results (no false positives)
- Efficient for multi-term queries
Weaknesses
- Slower inserts/updates (index maintenance overhead)
- Larger index size
- Uses pending list (delayed indexing for performance)
Conceptual Summary
GIN works like a reverse dictionary:
- Instead of asking “what words are in this row?”
- It answers “which rows contain this word?”
Final Insight
GIN is the default and most important index for PostgreSQL full-text search.
If your goal is:
- accurate search
- large-scale text querying
- predictable performance