Columnstore Indexes (SQL Server)
Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
This index uses column-based data storage and query processing to achieve gains up to 10× query performance and 10× compression over traditional row-oriented storage.
Core Idea (Shortly)
A Columnstore Index is a storage and execution model where:
- Data is stored column-by-column
- Queries are executed in batch (vectorized) mode
- Data is heavily compressed
- Optimized for analytical workloads
It is not just an index — it’s a different storage engine behavior
Storage Architecture
Traditional Rowstore:
Row 1 → [A, B, C]
Row 2 → [A, B, C]
Columnstore:
A → [A1, A2, A3]
B → [B1, B2, B3]
C → [C1, C2, C3]
Core Internal Components
1. Rowgroups (Fundamental Unit)
- Each rowgroup ≈ 1,048,576 rows
- Data is processed and compressed in groups
2. Column Segments
- Each column inside a rowgroup is stored separately
- Each segment has:
Used for segment elimination
3. Delta Store (Write Optimization Layer)
Critical concept:
- Inserts go to rowstore B-tree (delta store)
- Later compressed into columnstore
INSERT → Delta Store → Tuple Mover → Compressed Rowgroup
4. Tuple Mover
- Background process
- Converts delta store → compressed columnstore
5. Delete Bitmap
- Tracks logically deleted rows
- Avoids immediate rewrite of compressed data
Compression Techniques
Columnstore uses:
- Dictionary encoding
- Run-length encoding (RLE)
- Bit-packing
Result:
- 5×–15× compression
- Reduced I/O + memory
Execution Engine (MOST IMPORTANT)
Batch Mode Processing
Instead of:
Row-by-row (slow)
SQL Server uses:
Batch (~900 rows at once)
This is where most performance gain comes from
Segment Elimination
Each segment stores:
Query:
WHERE order_date = '2025-01-01'
SQL Server skips irrelevant segments
Types of Columnstore
1. Clustered Columnstore Index (CCI)
- Replaces entire table storage
- No rowstore remains
Best for:
- Fact tables
- Data warehouse
2. Nonclustered Columnstore Index (NCCI)
- Secondary structure
- Table remains rowstore
Best for:
- Hybrid workloads (OLTP + analytics)
Update / Insert Behavior
Inserts:
Updates:
DELETE + INSERT
Leads to:
- delete bitmap growth
- fragmentation
Query Optimization Behavior
Optimizer may choose:
- Batch mode scan
- Column elimination
- Aggregate pushdown
Aggregations become extremely fast
When Columnstore is Ideal
✅ Large tables (millions+)
✅ Aggregations (SUM, COUNT, AVG)
✅ Reporting / BI
✅ Star schema
Final Insight
Columnstore performance comes mainly from:
- Batch execution
- Compression
- Reduced I/O
—not just storage format.