Introduction

Writing efficient SQL queries is crucial for database performance, especially when dealing with large datasets. Poorly optimized queries can lead to slow response times, high CPU usage, and excessive memory consumption. Fortunately, by following a few key principles, you can significantly improve query performance.

In this guide, we’ll explore seven essential techniques to optimize your SQL queries, from avoiding SELECT * to optimizing JOINs and subqueries. Each technique includes a detailed explanation, performance issues, and improved approaches to help you write faster and more efficient queries.

**1 - Avoid SELECT * and Retrieve Only Necessary Columns**

What is this?

Using SELECT * retrieves all columns from a table, even if only a few are needed. This can increase query time, memory usage, and network bandwidth, especially on large datasets.

Why Does SELECT * Cause Performance Issues?

Unnecessary Data Retrieval → Fetching unused columns increases I/O and slows performance.
Increases Memory Usage → More data stored in RAM and transferred over the network.
Breaks Index Optimization → The database may not use indexes efficiently.
Schema Changes Can Break Code → If columns are added/removed, queries might fail or return incorrect data.

Examples:

❌ Not optimized:

SELECT * FROM orders WHERE user_id = 123;

Problems:

Retrieves all columns, even if only name and email are needed.
Unnecessary I/O and memory usage when processing extra columns.
Can slow down queries on large tables (e.g., millions of rows).

✅ Better approach: Selecting Only Needed Columns

SELECT id, order_date, total_amount FROM orders WHERE user_id = 123;

Why is this better?

Faster query execution by fetching only necessary data.
Reduces memory usage and network transfer time.
Indexes are used more efficiently, leading to better performance.
Future-proof → schema changes won’t affect queries unnecessarily.

2 - Use Keyset Pagination Instead of OFFSET

What is this?

Keyset pagination is a technique that retrieves the next set of records based on a known column value, rather than skipping rows using OFFSET. It is much faster for large datasets because it avoids scanning and discarding unnecessary rows.

Why Does OFFSET Cause Performance Issues?

OFFSET requires scanning and skipping rows
- The database still reads all skipped rows before returning the requested results.
- This slows down pagination as the OFFSET value increases.
Inefficient for large tables
- With millions of rows, paginating deeper (e.g., page 100,000) becomes extremely slow.
Higher memory and CPU usage
- More skipped rows consume database resources unnecessarily.

Example 1: Using OFFSET for Pagination with ID

❌ Not Optimized:

SELECT id, name, created_at FROM orders ORDER BY id LIMIT 10 OFFSET 100000;

Problems:

The database loads and scans 100,010 rows, then discards the first 100,000.
As OFFSET increases, the query becomes slower.
Inefficient for deep pagination (e.g., users navigating to page 1000).

✅ Better Approach: Use Keyset Pagination with ID

Instead of hardcoding values, store the last retrieved id and use it for the next page:

Step1: Fetch First Page

SELECT id, name, created_at FROM orders ORDER BY id LIMIT 10;

→ Returns last id = 500000

Step 2: Fetch Next Page Using Last ID

SELECT id, name, created_at 
FROM orders 
WHERE id > 500000 ORDER BY id LIMIT 10;

Why is this better?

No need to scan and skip rows – It fetches records directly from the last known position.
No performance degradation as pages increase.

Example 2: Handling by order key like created_at

❌ Not Optimized: Using OFFSET for Pagination

SELECT id, name, created_at 
FROM orders 
WHERE created_at > '2024-02-01 12:00:00' 
ORDER BY created_at LIMIT 10 OFFSET 100000;

Problems:

The database loads and scans 100,010 rows, then discard the first 100,000.
As OFFSET increases, the query becomes slower.
Poor performance for deep pagination (e.g., users navigating to page 1000).

✅ Better Approach:

SELECT id, name, created_at 
FROM orders 
WHERE created_at > '2024-02-01 12:00:00' 
ORDER BY created_at 
LIMIT 10;

=> This is look better by No need to scan and skip rows – It fetches records directly from the last known position.

But we may have duplicated at the last known position. Because created_time not a unique key.

=> ✅ Better Approach: Using a Dynamic Keyset Value for Pagination

SELECT id, name, created_at 
FROM orders 
WHERE (created_at, id) > ('2024-02-01 12:00:00', 5000) 
ORDER BY created_at, id 
LIMIT 10;

Why use (created_at, id)?

Ensures stable sorting, even if multiple rows have the same created_at.
Prevents duplicate or missing records when paginating.

3 - Optimize Queries with EXISTS Instead of IN

What is this?

Using IN with subqueries can cause full table scans, while EXISTS allows the optimizer to use indexes more efficiently.

Why Does IN Cause Performance Issues?

Checks against all values in a subquery result.
Index usage is not always efficient for large datasets.

Examples

❌ Not Optimized: Using IN

SELECT * 
FROM customers 
WHERE customer_id IN (SELECT customer_id 
						FROM orders 
						WHERE order_date > '2024-01-01');

Problems:

The subquery runs separately, forcing the database to scan all customer_ids.
Full table scan may occur if orders.customer_id is not indexed.

✅ Better Approach: Use EXISTS

SELECT * 
FROM customers c 
WHERE EXISTS ( SELECT 1 
				FROM orders o 
				WHERE o.customer_id = c.customer_id 
					AND o.order_date > '2024-01-01' );

Why is this better?

Uses an indexed lookup instead of scanning all rows.
Faster execution for large datasets.

✅ Better Approach: Use JOIN (buffer - not many cases)

In some cases, we can use Join alternate to use IN. In this example, we can use Join as better

SELECT DISTINCT c.* 
FROM customers c 
JOIN orders o ON c.customer_id = o.customer_id 
WHERE o.order_date > '2024-01-01';

Why is this better?

Efficient Execution: Databases are typically optimized for JOIN operations, which can be more efficient than subqueries, especially when dealing with large datasets. The database engine can use indexes more effectively to quickly find matching rows.
Reduced Overhead: The JOIN operation directly combines rows from both tables, which can be faster than evaluating a subquery for each row in the outer query.

4 - Use Composite Indexes for Multi-Column Filtering

What is this?

A composite index (multi-column index) allows faster filtering and sorting when querying multiple columns together.

Why Do Single-Column Indexes Cause Performance Issues?

Queries using multiple conditions may not efficiently use separate single-column indexes.
Sorting (ORDER BY) may cause an extra sorting step if not indexed properly.

Example 1: Using Separate Indexes on Each Column

❌ Not Optimized:

CREATE INDEX idx_users_age ON users (age); 
CREATE INDEX idx_users_city ON users (city); 

SELECT * FROM users WHERE age = 30 AND city = 'New York';

Problems:

Index merge is inefficient → The query may not fully use both indexes.
May cause a full table scan.

✅ Better Approach: Use a Composite Index

CREATE INDEX idx_users_age_city ON users (age, city); 

SELECT * FROM users WHERE age = 30 AND city = 'New York';

Why is this better?

Uses one index scan instead of two.
Faster execution for multi-column queries.

Example 2 - Matching Leftmost Columns in Composite Index

❌ Not Optimized: Query Doesn’t Follow the Column Order

CREATE INDEX idx_employees_department_age ON employees (department, age); 

SELECT * FROM employees WHERE age = 35;

Problems:

The index won’t be used efficiently because department comes first in the index definition.
The database may fall back to a full table scan.

✅ Better Approach: Query Includes Leftmost Column (department)

SELECT * 
FROM employees 
WHERE department = 'HR' AND age = 35;

Why is this better?

Uses the composite index efficiently since both department and age are used in order.
Avoids unnecessary full scans, improving query speed.

Example 3 - Optimizing Sorting with WHERE and ORDER BY

❌ Not Optimized: Separate Indexes on Filter and Sort Columns

CREATE INDEX idx_orders_customer_id ON orders (customer_id); 
CREATE INDEX idx_orders_order_date ON orders (order_date); 

SELECT * FROM orders WHERE customer_id = 101 ORDER BY order_date DESC;

Problems:

The database uses the index on customer_id for filtering but performs a sort operation separately.
Sorting can be slow, especially on large tables.

✅ Better Approach: Use a Composite Index (customer_id, order_date)

CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date DESC); 

SELECT * FROM orders WHERE customer_id = 101 ORDER BY order_date DESC;

Why is this better?

The database can both filter and sort using the index, eliminating the need for an extra sorting step.
Significantly improves performance for queries with both filtering and sorting.

5 - Minimize OR Conditions – Use UNION ALL When Possible

What is this?

OR conditions can prevent index usage and force full table scans. Splitting queries into UNION ALL statements allows indexes to be used efficiently.

Why Does OR Cause Performance Issues?

Index usage is limited → Queries with OR may not use indexes efficiently.
May cause a full scan if one condition is not indexed.

Examples

❌ Not Optimized: Using OR

SELECT * FROM orders WHERE status = 'shipped' OR created_at > '2024-01-01';

Problems:

Index is used only for one condition.
May trigger a full table scan.

✅ Better Approach: Use UNION ALL

SELECT * 
FROM orders 
WHERE status = 'shipped' 
UNION ALL 
SELECT * FROM orders WHERE created_at > '2024-01-01';

Why is this better?

Allows each condition to use its index efficiently.
Faster execution, especially for large datasets.

6 - Use Indexes Efficiently in the WHERE Clause

What is this?

Indexes help speed up query execution by allowing the database to quickly locate rows instead of scanning the entire table. However, poorly written WHERE conditions can prevent indexes from being used effectively, leading to slow performance.

Why Does Improper WHERE Usage Cause Performance Issues?

Full Table Scans Instead of Index Scans → If a query cannot use an index, the database scans the entire table, making queries slow.
Functions on Indexed Columns Disable Indexing → Using functions like LOWER(column_name) prevents index usage.
Inefficient Operators Prevent Index Optimization → Using !=, <>, NOT IN, or LIKE '%value%' forces full scans.
Implicit Data Type Conversion → Comparing different data types (VARCHAR vs. INTEGER) prevents index usage.

Example 1: Using Functions on Indexed Columns

❌ Not Optimized:

SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

Problems:

The LOWER(email) function forces a full table scan, even if email is indexed.
The database cannot use the index because each row’s value must be recalculated.

✅ Better Approach: Store and Query Consistently

SELECT * FROM users WHERE email = 'john@example.com';

Why is this better?

Uses the index on email, allowing an index seek instead of a full scan.
Faster execution because it avoids unnecessary function calls.
Best practice: Store data in consistent case format (e.g., always lowercase).

Example 2: Using != or <> on Indexed Columns

❌ Not optimized

SELECT * FROM orders WHERE status <> 'Completed';

Problems:

The database must check every row, leading to a full table scan.
The index on status cannot be used efficiently.

✅ Better Approach: Use IN() Instead of <>

SELECT * FROM orders WHERE status IN ('Pending', 'Processing');

Why is this better?

Uses the index efficiently to fetch relevant rows.
Narrowing down conditions improves performance instead of excluding values.

Example 3: Use Prefix Search (LIKE 'value%') or Full-Text Indexing

❌ Not optimized:

SELECT * FROM products WHERE name LIKE '%phone';

Problems:

Cannot use an index because % at the beginning forces a full table scan.
The database must check every row manually.

✅ Better approach:

SELECT * FROM products WHERE name LIKE 'iPhone%';

Why is this better?

Uses the index because the search starts from the beginning of the value.
If complex search is needed, consider FULLTEXT indexes instead of LIKE '%value%'.

Example 4: Query with NOT IN with a list

NOT IN is used to filter out specific values in a query. However, when used with an indexed column, it often prevents the database from efficiently utilizing the index, leading to full table scans instead of indexed lookups.

❌ Using NOT IN prevents index usage

SELECT * FROM orders WHERE status NOT IN ('Completed', 'Shipped');

Problems:

The query forces a full table scan because indexes are optimized for searching “what exists” rather than “what doesn’t exist.”
The database must check every row manually, leading to slow performance on large datasets.
If status contains NULL, the query may return incorrect results.

✅ Optimized Alternative: Use BETWEEN or IN Instead of NOT IN

SELECT * FROM orders WHERE status IN ('Pending', 'Processing');

Why is this better?

Indexes are optimized for IN() queries, allowing fast lookups.
Works well if the exclusion list is small.

Example 5: - Query with NOT IN with a large list

❌ Not Optimized

SELECT * 
FROM customers 
WHERE customer_id NOT IN (SELECT customer_id FROM orders);

Problems:

The database must compare every row, leading to a full table scan.
If orders.customer_id contains NULL, the query may return incorrect results.
Indexes on customer_id are not used efficiently, making the query slow for large datasets.

✅ Better Approach: Use NOT EXISTS Instead of NOT IN

SELECT * 
FROM customers c 
WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id );

Why is this better?

Uses the index on customer_id efficiently, allowing an indexed lookup instead of a full scan.
Avoids NULL-related issues that occur with NOT IN.
Faster performance, especially for large datasets, because EXISTS stops checking once a match is found.

✅ Alternative: Use LEFT JOIN … IS NULL Instead of NOT IN

SELECT c.* 
FROM customers c 
LEFT JOIN orders o ON c.customer_id = o.customer_id 
WHERE o.customer_id IS NULL;

Why is this better?

Indexes are utilized efficiently, allowing a fast join lookup instead of scanning all rows.
More optimized query execution compared to NOT IN.
Works well when filtering for records that do not exist in another table.

7 - Optimize JOINs and Subqueries

What is this?

JOINs and subqueries are used to combine data from multiple tables, but they can cause performance issues if not optimized properly. Efficient indexing and query structuring can significantly speed up execution.

Why Do JOINs and Subqueries Cause Performance Issues?

JOINs on non-indexed columns → Cause full table scans and slow lookups.
Subqueries that execute independently → Run multiple times instead of optimizing with indexes.
Unnecessary large result sets → Increase memory and processing time.

Example 1: Using a Correlated Subquery

❌ Not Optimized:

SELECT c.customer_id, c.name, (SELECT COUNT(*) 
								FROM orders o 
								WHERE o.customer_id = c.customer_id) AS order_count 
FROM customers c;

Problems:

The subquery runs separately for each customer, leading to N executions (very slow for large tables).
Causes multiple full table scans on orders.

✅ Better Approach: Use a JOIN with GROUP BY

SELECT c.customer_id, c.name, COUNT(o.order_id) AS order_count 
FROM customers c 
LEFT JOIN orders o ON c.customer_id = o.customer_id 
GROUP BY c.customer_id, c.name;

Example 2: Avoid JOINs on Unindexed Columns

❌ Not Optimized: Joining on a Non-Indexed Column

SELECT o.order_id, c.name 
FROM orders o 
JOIN customers c ON o.customer_email = c.email;

Problems:

If customers.email is not indexed, the database must scan the entire customers table for each row in orders.
Slow performance for large tables.

✅ Better Approach: Ensure JOIN Columns Are Indexed

CREATE INDEX idx_customers_email ON customers(email); 

SELECT o.order_id, c.name 
FROM orders o 
JOIN customers c ON o.customer_id = c.customer_id;

Why is this better?

The query can use the index on customer_id, making lookups significantly faster.
Index scans are much faster than full table scans.

Example 3: Use INNER JOIN Instead of OUTER JOIN When Possible

❌ Not Optimized: Using LEFT JOIN Unnecessarily

SELECT c.customer_id, c.name, o.order_id 
FROM customers c 
LEFT JOIN orders o ON c.customer_id = o.customer_id;

Problems:

LEFT JOIN returns all customers, even those without orders, which may be unnecessary.
Requires additional work to return NULL values.

✅ Better Approach: Use INNER JOIN If Matching Data Is Required

SELECT c.customer_id, c.name, o.order_id 
FROM customers c 
INNER JOIN orders o ON c.customer_id = o.customer_id;

Why is this better?

Reduces unnecessary rows → Only includes customers with orders.
Improves query execution time by filtering out unmatched records early.

NOTE

Avoid Over-Indexing
Too many indexes can slow down INSERT, UPDATE, and DELETE operations.
=> Index only the columns frequently used in WHERE, JOIN, or ORDER BY.
=> Use composite indexes instead of multiple single-column indexes.
=> Periodically analyze index usage (EXPLAIN ANALYZE).
Use EXPLAIN ANALYZE to Check Query Performance
Before optimizing, queries should always be analyzed to show the execution plan, index usage, and bottlenecks.
Example:
To see if your query is using an index, run:

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;

If it shows “Using index”, it’s optimized. If it says “Full table scan”, the index is not being used.

Summary

Optimization Technique	Why It Helps?
*Avoid SELECT and Retrieve Only Necessary Columns**	Reduces data transfer and speeds up queries.
Use Keyset Pagination Instead of OFFSET	Prevents performance degradation in large datasets.
Optimize Queries with EXISTS Instead of IN	Improves efficiency by reducing unnecessary scans.
Use Composite Indexes for Multi-Column Filtering	Enables efficient lookups and sorting.
Minimize OR Conditions – Use UNION ALL When Possible	Reduces full table scans and improves index usage.
Use Indexes Efficiently in the WHERE Clause	Ensures optimal index utilization for filtering.
Optimize JOINs and Subqueries	Avoids unnecessary processing and improves query execution.

For more information, let's Like & Follow MFV sites for updatingblog, best practices, career stories of Forwardians at:

Facebook: https://www.facebook.com/moneyforward.vn

Linkedin: https://www.linkedin.com/company/money-forward-vietnam/

Youtube: https://www.youtube.com/channel/UCtIsKEVyMceskd0YjCcfvPg

Optimizing SQL Queries: 7 Simple Ways to Improve Performance - Mysql8

Introduction

**1 - Avoid SELECT * and Retrieve Only Necessary Columns**

Example 2: Handling by order key like created_at

3 - Optimize Queries with EXISTS Instead of IN

4 - Use Composite Indexes for Multi-Column Filtering

Example 1: Using Separate Indexes on Each Column

Example 2 - Matching Leftmost Columns in Composite Index

Example 3 - Optimizing Sorting with WHERE and ORDER BY

5 - Minimize OR Conditions – Use UNION ALL When Possible

6 - Use Indexes Efficiently in the WHERE Clause

Example 1: Using Functions on Indexed Columns

Example 2: Using != or <> on Indexed Columns

Example 3: Use Prefix Search (LIKE 'value%') or Full-Text Indexing

Example 4: Query with NOT IN with a list

Example 5: - Query with NOT IN with a large list

7 - Optimize JOINs and Subqueries

Example 1: Using a Correlated Subquery

Example 2: Avoid JOINs on Unindexed Columns

Example 3: Use INNER JOIN Instead of OUTER JOIN When Possible

NOTE

Summary

Most read

Thuyết trình với quy tắc ABC

“Tân binh khủng long” của MFV chính thức “Nhật tiến”

Unconscious Bias Trong Chốn Công Sở

How our team set up Automation Test by Playwright and Cucumber for multiple projects?

Mock tạo thủ công có thể dùng để test trong Golang

More like this

Unlocking Ruby: My Experience with Silver and Gold Certification Exams

MFV Tech Talk #3 - Service Operation | Kitto & Leon

AWS DynamoDB - Best Practices for “Exponential backoff retry” to improve the resource accessibility