backend
database
well-grounded

SQL optimization with indexes

Author: Tomasz Wójcik

Problem context

Proper index usage is a key to success when dealing with various database providers. Thankfully there are few easy rules to follow that will greatly improve queries performance.

Keep in mind that some cases can occur in various database providers, while other providers won't have such problems and/or solutions.

Also it is worth noting that indexes in this article are B-Tree (balanced tree)

Solutions

Quick note before we start overusing indexes (aka indexing everything) - Indexing is a double-edged blade, so think twice before creating a new index. You can speed up reads but at the cost of writes and space/memory usage.

Also, always check query with EXPLAIN <query> and in some cases EXPLAIN ANALYSE <query> - the latter should be wrapped in the transaction because it actually runs query and you'll get unexpected effects especially with DELETE and UPDATE queries, however it will provide real benchmark.

Keep proper index ordering

Index ordering impacts query performance, so when querying you might get information about index being used (via EXPLAIN), but the cost of the query might be different with different index ordering.

For given index:

CREATE UNIQUE INDEX first_name_last_name_idx ON employees (first_name, last_name);

On my machine with Postgres database when running the following query:

EXPLAIN SELECT first_name, last_name
FROM employees
WHERE first_name = 'Bo';

I have the following query plan (look at the cost):

Index Only Scan using first_name_last_name_idx on employees  (cost=0.29..4.30 rows=1 width=10)
  Index Cond: (first_name = 'Bo'::text)

However for querying for last_name (which is part of index) instead of first_name I get the query plan:

Index Only Scan using first_name_last_name_idx on employees  (cost=0.29..231.29 rows=1 width=10)
  Index Cond: (last_name = 'Bo'::text)

You might see that there is Index Only Scan used but the query cost is significantly bigger.

Always think about proper column ordering in the index

Use LIKE's wildcard wisely

Keep in mind that only the part before first wildcard (%) serves as an access predicate when using indexes so use it wisely.

Two practices when using LIKE:

Index for equality first

When searching for both ranges and equality on indexed columns ALWAYS index for equality first, then for ranges. Index (B-tree) will have less leaf nodes to traverse - will find equal element first and then search for given ranges, instead of searching all range leaf nodes for given (equal) element.

Index-only queries (scans)

If you have a covering index (index that covers every column in your query - first_name and last_name from example above) you can avoid accessing table and use only data from the index - this will result in much faster query.