Back

Understanding Database Indexing for Better Performance

Understanding Database Indexing for Better Performance

Your database queries are taking seconds when they should take milliseconds. The culprit? Missing or poorly designed indexes. Database indexing is the difference between scanning millions of rows and jumping directly to the data you need—but getting it wrong can slow down writes and bloat your storage.

This article covers database indexing basics, from B-tree and composite indexes to advanced techniques like covering indexes and query plans. You’ll learn how to index modern data types including JSON and text fields, and most importantly, how to avoid the performance trap of over-indexing in modern databases.

Key Takeaways

  • Database indexes trade faster reads for slower writes and increased storage
  • B-tree indexes handle most queries efficiently, while specialized indexes serve JSON and text data
  • Composite index column order determines query support—equality columns should come first
  • Covering indexes enable index-only scans by including all query columns
  • Over-indexing hurts write performance—monitor and remove unused indexes regularly

What Database Indexes Actually Do

Think of querying a users table with 10 million rows:

SELECT * FROM users WHERE email = 'user@example.com';

Without an index, PostgreSQL or MySQL must check every single row—a full table scan. With an index on email, the database jumps directly to the matching row using a B-tree structure, typically reducing lookup time from O(n) to O(log n).

An index is essentially a sorted copy of specific columns with pointers back to the full rows. The trade-off: faster reads at the cost of slower writes (the index must be updated) and additional storage (typically 10-20% of table size per index).

B-tree Indexes: The Workhorse of Databases

B-tree indexes power most database queries. They maintain data in a balanced tree structure where each node contains multiple sorted keys. This design minimizes disk I/O—crucial since disk access is orders of magnitude slower than memory access.

CREATE INDEX idx_users_created_at ON users(created_at);

B-trees excel at:

  • Equality checks (WHERE status = 'active')
  • Range queries (WHERE created_at > '2024-01-01')
  • Sorting (ORDER BY created_at)
  • Prefix searches (WHERE email LIKE 'john%')

PostgreSQL and MySQL InnoDB use B+tree variants where all data lives in leaf nodes, making range scans more efficient. MongoDB uses similar B-tree structures for its indexes, though document databases add complexity with nested field indexing.

Composite Indexes: Column Order Matters

Composite indexes cover multiple columns, but column order is critical:

CREATE INDEX idx_events_user_date ON events(user_id, created_at);

This index supports:

  • WHERE user_id = 123
  • WHERE user_id = 123 AND created_at > '2024-01-01'

But NOT:

  • WHERE created_at > '2024-01-01' (without user_id)

The index works left-to-right. Place columns used for equality first, then ranges. For an API endpoint that filters events by user and date range, this composite index eliminates the need for separate indexes on each column.

Covering Indexes and Query Plans

A covering index includes all columns needed by a query, enabling index-only scans:

-- PostgreSQL example
CREATE INDEX idx_orders_covering 
ON orders(user_id, status, created_at) 
INCLUDE (total_amount);

For this query:

SELECT total_amount FROM orders 
WHERE user_id = 123 AND status = 'completed';

The database never touches the table—everything comes from the index. Use EXPLAIN to verify:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Look for “Index Only Scan” in PostgreSQL or “Using index” in MySQL. Query plans reveal whether your indexes are actually being used. High-cost sequential scans indicate missing indexes, while unused indexes waste resources.

Indexing JSON and Text Fields

Modern applications store unstructured data requiring specialized indexes:

JSON Indexing (PostgreSQL)

CREATE INDEX idx_metadata_gin ON products 
USING GIN (metadata);

-- Supports queries like:
SELECT * FROM products 
WHERE metadata @> '{"category": "electronics"}';

GIN indexes handle JSON containment queries efficiently, though they’re larger and slower to update than B-trees. For high-write tables, indexing only specific JSON paths can be more efficient than indexing entire documents.

CREATE INDEX idx_posts_text ON posts 
USING GIN (to_tsvector('english', content));

-- Enables:
SELECT * FROM posts 
WHERE to_tsvector('english', content) @@ 'database & indexing';

Text indexes enable Google-like search capabilities but can double storage requirements for text-heavy tables.

Avoiding Over-Indexing in Modern Databases

More indexes aren’t always better. Each index:

  • Slows INSERT/UPDATE/DELETE operations (must update index)
  • Increases storage (10-20% per index)
  • Adds maintenance overhead (fragmentation, rebuilds)

Signs of over-indexing:

  • Write-heavy tables with 5+ indexes
  • Duplicate indexes (e.g., (a, b) and (a))
  • Unused indexes (check pg_stat_user_indexes or sys.dm_db_index_usage_stats)
  • Index size exceeding table size

Best practices:

  1. Start with indexes on primary keys and foreign keys
  2. Add indexes based on slow query logs, not assumptions
  3. Consolidate with composite indexes where possible
  4. Drop unused indexes after monitoring for 30+ days
  5. Consider partial indexes for large tables with filtered queries:
CREATE INDEX idx_orders_recent 
ON orders(created_at) 
WHERE created_at > '2024-01-01';

Index Selectivity and Real-World Performance

Index selectivity—the ratio of unique values to total rows—determines effectiveness. An index on a boolean column (2 values) has low selectivity and rarely helps. An index on email (all unique) has perfect selectivity.

Monitor real production metrics:

  • Query execution time before/after indexing
  • Index hit rates
  • Write performance impact
  • Storage growth

Tools like pgAdmin, MySQL Workbench, or MongoDB Compass provide visual query analysis. For production monitoring, consider DataDog or New Relic.

Conclusion

Effective database indexing requires understanding your query patterns, not following rigid rules. Start with B-tree indexes on columns in WHERE, ORDER BY, and JOIN clauses. Use composite indexes strategically, considering column order. Leverage covering indexes for read-heavy queries and specialized indexes for JSON and text data.

Most importantly, measure everything. Use EXPLAIN to understand query plans, monitor index usage statistics, and track the real impact on both read and write performance. The goal isn’t to index everything—it’s to index exactly what your application needs.

FAQs

No. Index columns based on query frequency and selectivity. Columns with low selectivity like booleans rarely benefit from indexes. Focus on columns used in slow queries and consider composite indexes for queries with multiple conditions.

Use EXPLAIN ANALYZE in PostgreSQL or EXPLAIN in MySQL to see query execution plans. Check system tables like pg_stat_user_indexes or sys.dm_db_index_usage_stats to track index usage over time. Unused indexes after 30 days are candidates for removal.

Each index slows write operations since the database must update all relevant indexes. Storage increases by 10-20% per index. Maintenance overhead grows with fragmentation and rebuild requirements. Five or more indexes on write-heavy tables often indicate over-indexing.

Use composite indexes when queries filter on multiple columns together. The database can use one composite index more efficiently than combining multiple single indexes. Order matters: place equality conditions first, then ranges. A composite index on user_id and created_at serves both columns together.

Understand every bug

Uncover frustrations, understand bugs and fix slowdowns like never before with OpenReplay — the open-source session replay tool for developers. Self-host it in minutes, and have complete control over your customer data. Check our GitHub repo and join the thousands of developers in our community.

OpenReplay