sql-server

7 Пост

oracle

7 Пост

postgresql

10 Пост

my-sql

2 Пост

common-sql

1 Пост

News

5 Новости

Подтема существует

PostgreSQL Text Search Index Types (Short Overview)

PostgreSQL full-text search is implemented using three index types:

  • GIN (Generalized Inverted Index)
  • GiST (Generalized Search Tree)
  • RUM (Real-time Universal Metric, extension)

These are the only index types that natively support tsvector / tsquery full-text search.


 1. GIN (Generalized Inverted Index)

Idea

Store a direct mapping from each word (lexeme) to all rows that contain it.

Main Logic

  • Convert text → tsvector (set of lexemes)
  • For each lexeme:
    • store list of row IDs (posting list)
  • Query:
    • break into lexemes
    • fetch row sets
    • combine (AND / OR)

Result: exact matches, no recheck needed


2. GiST (Generalized Search Tree)

Idea

Store a compressed signature (fingerprint) of document content instead of exact words.

Main Logic

  • Convert text → signature (bit representation of lexemes)
  • Store signatures in tree structure
  • Query:
    • compare query signature with stored signatures
    • return candidate rows
    • recheck actual data

Result: approximate match → requires verification


3. RUM (Extension Index)

Idea

Extend inverted index by storing extra information (positions, ranking data) together with lexemes.

Main Logic

  • Similar to GIN:
    • lexeme → row IDs
  • Additionally stores:
    • positions of words
    • frequency / ranking data
  • Query:
    • retrieve rows
    • compute ranking directly from index
    • return ordered results

Result: exact match + efficient ranking

Final Conclusion

  • GIN → exact inverted index (standard choice)
  • GiST → signature-based filtering (approximate)
  • RUM → inverted index with ranking support

Core difference:

  • GIN = precise lookup
  • GiST = probabilistic filter
  • RUM = lookup + ranking engine