sql-server

7 Пост

oracle

7 Пост

postgresql

10 Пост

my-sql

2 Пост

common-sql

1 Пост

News

5 Новости

Основная тема

GIN (Generalized Inverted Index) — Full Description

 

Idea

GIN is based on the concept of an inverted index.

Instead of storing rows → values, it stores:
value (lexeme) → rows that contain it

This makes it extremely efficient for searching text, where the goal is to find which rows contain specific words.

Main Logic

1. Text Transformation

  • Input text is converted into a tsvector
  • This process:
    • tokenizes text into words
    • normalizes them (stemming, lowercasing)
    • removes stop words

2. Index Structure

For each lexeme, GIN stores:

  • Posting List → list of row identifiers (TIDs)
  • If large → converted into a Posting Tree

Structure conceptually:

 
"bank" → [row1, row5, row20]
"loan" → [row2, row5]
 

3. Search Execution

  • Query is converted into tsquery (lexemes + operators)
  • For each lexeme:
    • retrieve matching row sets
  • Combine results:
    • AND → intersection
    • OR → union

Result: exact matching rows

4. No Recheck Needed

Unlike GiST:

  • GIN returns only valid matches
  • No additional verification step required

Key Characteristics

✔ Strengths

  • Very fast for full-text search
  • Scales well with large datasets
  • Precise results (no false positives)
  • Efficient for multi-term queries

Weaknesses

  • Slower inserts/updates (index maintenance overhead)
  • Larger index size
  • Uses pending list (delayed indexing for performance)

Conceptual Summary

GIN works like a reverse dictionary:

  • Instead of asking “what words are in this row?”
  • It answers “which rows contain this word?”

Final Insight

GIN is the default and most important index for PostgreSQL full-text search.

If your goal is:

  • accurate search
  • large-scale text querying
  • predictable performance