sql-server

7 Пост

oracle

7 Пост

postgresql

10 Пост

my-sql

2 Пост

common-sql

1 Пост

News

5 Новости

Columnstore Indexes (SQL Server)

Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
This index uses column-based data storage and query processing to achieve gains up to 10× query performance and 10× compression over traditional row-oriented storage.

Core Idea (Shortly)

A Columnstore Index is a storage and execution model where:

  • Data is stored column-by-column
  • Queries are executed in batch (vectorized) mode
  • Data is heavily compressed
  • Optimized for analytical workloads

It is not just an index — it’s a different storage engine behavior

Storage Architecture

Traditional Rowstore:

Row 1 → [A, B, C]
Row 2 → [A, B, C]

Columnstore:

A → [A1, A2, A3]
B → [B1, B2, B3]
C → [C1, C2, C3]

Core Internal Components

1. Rowgroups (Fundamental Unit)

  • Each rowgroup ≈ 1,048,576 rows
  • Data is processed and compressed in groups

2. Column Segments

  • Each column inside a rowgroup is stored separately
  • Each segment has:
    • MIN value
    • MAX value

Used for segment elimination

3. Delta Store (Write Optimization Layer)

Critical concept:

  • Inserts go to rowstore B-tree (delta store)
  • Later compressed into columnstore
INSERT → Delta Store → Tuple Mover → Compressed Rowgroup

4. Tuple Mover

  • Background process
  • Converts delta store → compressed columnstore

5. Delete Bitmap

  • Tracks logically deleted rows
  • Avoids immediate rewrite of compressed data

Compression Techniques

Columnstore uses:

  • Dictionary encoding
  • Run-length encoding (RLE)
  • Bit-packing

Result:

  • 5×–15× compression
  • Reduced I/O + memory

Execution Engine (MOST IMPORTANT)

Batch Mode Processing

Instead of:

Row-by-row (slow)

SQL Server uses:

Batch (~900 rows at once)

This is where most performance gain comes from

Segment Elimination

Each segment stores:

  • MIN / MAX metadata

Query:

WHERE order_date = '2025-01-01'

SQL Server skips irrelevant segments

Types of Columnstore

1. Clustered Columnstore Index (CCI)

  • Replaces entire table storage
  • No rowstore remains

Best for:

  • Fact tables
  • Data warehouse

2. Nonclustered Columnstore Index (NCCI)

  • Secondary structure
  • Table remains rowstore

Best for:

  • Hybrid workloads (OLTP + analytics)

Update / Insert Behavior

Inserts:

  • Go to delta store

Updates:

DELETE + INSERT

Leads to:

  • delete bitmap growth
  • fragmentation

Query Optimization Behavior

Optimizer may choose:

  • Batch mode scan
  • Column elimination
  • Aggregate pushdown

Aggregations become extremely fast

When Columnstore is Ideal

Large tables (millions+)
Aggregations (SUM, COUNT, AVG)
Reporting / BI
Star schema

Final Insight

Columnstore performance comes mainly from:

  • Batch execution
  • Compression
  • Reduced I/O

—not just storage format.