Partitioning in MySQL 8

Partitioning in MySQL 8

Partitioning is a database optimization technique that allows you to split a large table into smaller, more manageable pieces called partitions. Each partition is stored separately, but still logically represents a single table. This helps improve performance and maintainability, especially for large datasets.
Partitioning in MySQL 8

1. Introduction

1.1. What is partitioning (in MySQL 8)?

Partitioning splits a large table into smaller physical units called partitions. The table is still queried as one logical table, but MySQL can limit work to only the relevant partitions.

Important limitation (read this first): In MySQL 8, partitioned InnoDB tables cannot use foreign keys. For many schemas, this alone makes partitioning a non-starter.

In short: Partitioning is mainly a tool for:

  • Data lifecycle management: dropping old data fast.
  • Query performance via partition pruning: when queries filter by the partition key.

1.2. Who this article is for

This guide is most useful if:

  • Your table is large (often tens or hundreds of millions of rows).
  • Most queries are time-bounded or otherwise filter on a single "natural" key.
  • You need a safe and fast way to remove old data.

1.3. Should you use partitioning? (quick decision)

Good fit when:

  • You have time-series data (logs, events, transactions) and regularly purge old ranges.
  • Your important queries always include the partition key (so pruning can happen).

Usually a bad fit when:

  • You require foreign keys.
  • You rely on full-text or spatial indexes.
  • Your queries cannot filter on the partition key.

1.4. What performance improvements look like (before & after)

Partitioning can improve performance in two main areas:

  1. Data management: removing old data in bulk.
  2. Query performance: scanning fewer partitions through partition pruning.

Example 1: Data management (purging old data)

  • Scenario: event_logs has billions of rows. A nightly job must delete data older than 3 years.
  • Strategy: Range partitioning on log_date, one partition per month.

Before partitioning

DELETE FROM event_logs WHERE log_date < '2022-11-01';
  • Slow row-by-row work.
  • Large undo/redo logs.
  • Locking and operational risk.

After partitioning

ALTER TABLE event_logs DROP PARTITION logs_2022_10;
  • A near-instant metadata operation.
  • Deletes the whole month’s data at once.

Example 2: Query performance (faster reports)

  • Scenario: sales contains 10 years of data. You need a Q1 2024 report.
  • Strategy: Range partitioning on sale_date, one partition per quarter.

Query:

SELECT * FROM sales
WHERE sale_date >= '2024-01-01'
  AND sale_date <  '2024-04-01';

With partitioning, MySQL can often scan only the partitions that overlap the date range.

Key takeaways so far

Partitioning helps when it changes how much data MySQL must touch:

  1. For queries: pruning can reduce scanned partitions.
  2. For purging: DROP PARTITION can replace expensive DELETE operations.

2. Setting Up Partitioning in MySQL 8

2.1. Requirements & Limitations in MySQL 8

Requirements:

  • Every unique key on the table (including the primary key) must contain the partition key.
  • Only InnoDB supports partitioning (MyISAM partitioning was removed in MySQL 8).
  • Max 8192 partitions per table.
  • While you can have thousands of partitions, too many can increase memory overhead and file descriptor usage. Create only as many partitions as you actively need.

Limitations:

  • MySQL partitioning does not support:
    • Foreign keys.
    • Full-text indexes.
    • Spatial indexes.
  • Partition pruning only works when queries include the partition key.
  • Auto-increment columns must be part of the partition key.

Tip: In queries, keep predicates SARGable for the partition expression. Avoid wrapping the partition key in functions when you can.

2.2. Creating Partitioned Tables

The next sections cover the common partitioning methods and when each one is a good fit.

2.2.1 Range Partitioning

Definition:

  • In Range Partitioning, each partition holds rows that fall within a specified range of values.
  • It is commonly used for time-series data (e.g., partitioning by date).

Example Use Case:

  • You have a sales table where data is frequently queried by year.
CREATE TABLE sales (
    id INT NOT NULL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2),
    PRIMARY KEY (id, sale_date)
)
PARTITION BY RANGE(YEAR(sale_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

How it works:

  • The sales table is split by year.
  • A query filtering sale_date BETWEEN '2023-01-01' AND '2023-12-31' will only scan partition p2023.

Best for: Time-series data, logs, financial transactions.

Limitations:

  • You must manually add a new partition as new data ranges appear.
  • Partition pruning only works if queries include the partition key (sale_date).

2.2.2. List Partitioning

Definition:

  • Similar to Range Partitioning, but instead of ranges, data is stored in partitions based on a list of discrete values.

Example Use Case:

  • A company stores customer data per region (e.g., North America, Europe, Asia).

SQL Example:

CREATE TABLE customers (
    id INT NOT NULL,
    name VARCHAR(100),
    country_code CHAR(2) NOT NULL,
    PRIMARY KEY (id, country_code)
)
PARTITION BY LIST COLUMNS(country_code) (
    PARTITION p_usa VALUES IN ('US'),
    PARTITION p_canada VALUES IN ('CA'),
    PARTITION p_europe VALUES IN ('FR', 'DE', 'UK'),
    PARTITION p_asia VALUES IN ('JP', 'CN', 'VN')
);

How it works:

  • Queries like SELECT * FROM customers WHERE country_code = 'US' will only scan p_usa.

Best for: Multi-region datasets, categorizing data by specific values.

Limitations:

  • You must manually manage partitions when new values appear (e.g., if a new country is added).
  • Only works well if values are well-distributed across partitions.

2.2.3. Hash Partitioning

Definition:

  • Hash Partitioning distributes rows across partitions evenly using a hash function.
  • This is useful for balancing data across multiple partitions.

Example Use Case:

  • A large users table where queries use user_id frequently.

SQL Example:

CREATE TABLE users (
    id INT NOT NULL,
    name VARCHAR(100),
    email VARCHAR(255),
    PRIMARY KEY (id)
)
PARTITION BY HASH(id) PARTITIONS 4; -- MySQL distributes rows across 4 partitions

How it works:

  • The partition is determined using id % 4.
  • MySQL automatically distributes data across 4 partitions.

Best for: Evenly distributing data when there’s no natural range (e.g., user IDs, order IDs).

Limitations:

  • Pruning is usually effective for equality predicates on the partitioning expression (for example, WHERE id = 123), but it is often limited for range predicates (for example, BETWEEN) and other patterns.
  • Hard to predict which partition a row will be in.
  • Not a good fit for time-based retention, since partitions do not align with time ranges.

2.2.4. Key Partitioning

Definition:

  • Similar to Hash Partitioning, but MySQL’s internal hash function is used instead of a custom one.
  • It is useful when working with non-numeric partition keys.

Example Use Case:

  • Partitioning by email or UUID.

SQL Example:

CREATE TABLE employees (
    emp_id VARCHAR(36) NOT NULL,  -- UUID
    name VARCHAR(100),
    department_id INT NOT NULL,
    PRIMARY KEY (emp_id)
)
PARTITION BY KEY(emp_id) PARTITIONS 5;

How it works:

  • MySQL automatically determines which partition to store each row in using a hash function.

Best for: When partitioning by non-numeric columns (UUIDs, email addresses, etc.).

Limitations:

  • No manual control over partitioning logic.
  • Cannot use multi-column keys for partitioning (unlike Hash Partitioning).

Comparison of partitioning methods

Partition typeBest forPruning works?Automatic partitioning?
RangeTime-seriesYesNo
ListCategorized data (regions, product types)YesNo
HashEvenly distributing data (user IDs)NoYes
KeyNon-numeric keys (UUIDs, emails)NoYes

Conclusion

  • Range Partitioning is best for time-based data.
  • List Partitioning is ideal for categorized data, like country or department.
  • Hash Partitioning evenly spreads out data, but doesn’t support pruning.
  • Key Partitioning is like Hash Partitioning, but for non-numeric keys.

3. Managing Partitions in MySQL 8

Note: For large tables, validate changes in a staging environment first. Operations like REORGANIZE PARTITION can be expensive because they may copy or rebuild partition data.

3.1. Adding New Partitions

Adding partitions is mainly relevant for Range and List Partitioning because new data might not fit in the existing partitions.

Example: Adding a New Range Partition

  • Suppose the sales table has partitions by year. We need to add a new partition for 2025.
ALTER TABLE sales ADD PARTITION (
    PARTITION p2025 VALUES LESS THAN (2026)
);

How it works:

  • This adds a new partition p2025 to store sales data for the year 2025.
  • If sale_date = '2026-01-01', MySQL rejects the insert because no partition exists for 2026.

Limitations:

  • MySQL does not allow adding partitions to Hash or Key Partitioning.
  • You cannot insert data that falls outside the partition ranges.

3.2. Dropping Partitions

Dropping partitions is a fast way to delete old data instead of using DELETE, which is slow for large datasets.

Example: Dropping an Old Range Partition

  • If data from 2022 is no longer needed, drop partition p2022:
ALTER TABLE sales DROP PARTITION p2022;

How it works:

  • This removes all data from partition p2022 instantly.
  • Unlike DELETE, it does not generate undo logs, making it much faster.

Be careful:

  • Data is permanently deleted and cannot be recovered.
  • Indexes related to the dropped partition are also removed.

3.3. Merging and Splitting Partitions

Sometimes, you might need to merge smaller partitions or split a large partition.

Merging partitions (RANGE / LIST)

MySQL 8 supports merging partitions using REORGANIZE PARTITION.

Example: merge two monthly partitions into a single quarterly partition:

ALTER TABLE event_logs
REORGANIZE PARTITION logs_2024_01, logs_2024_02, logs_2024_03 INTO (
    PARTITION logs_2024_Q1 VALUES LESS THAN ('2024-04-01')
);

Notes:

  • This is not just a metadata change. MySQL must move rows into the new partition(s).
  • Use a partition boundary that matches your partitioning expression.

Splitting a partition (RANGE / LIST)

Example: split a yearly partition into two half-year partitions:

ALTER TABLE sales
REORGANIZE PARTITION p2024 INTO (
    PARTITION p2024_h1 VALUES LESS THAN (DATE('2024-07-01')),
    PARTITION p2024_h2 VALUES LESS THAN (DATE('2025-01-01'))
);

Reducing HASH / KEY partitions

For HASH/KEY partitioning, you cannot add partitions, but you can reduce the partition count using COALESCE PARTITION:

ALTER TABLE users COALESCE PARTITION 4;

Important: COALESCE PARTITION n reduces the total number of partitions by n (it subtracts n from the current partition count) and redistributes rows.

Warning: REORGANIZE and COALESCE can be heavy operations on large datasets. Estimate impact and run them during a maintenance window if needed.

3.4. Modifying partitioned tables

Unlike regular tables, modifying partitioned tables has some limitations.

You cannot:

  • Add a foreign key.
  • Change the primary key if it includes the partition column.
  • Convert a partitioned table to a non-partitioned table directly.

You can:

  • Add new columns:
ALTER TABLE sales ADD COLUMN description VARCHAR(255);
  • Drop columns (if not part of the partition key):
ALTER TABLE sales DROP COLUMN amount;
  • Rebuild partitions with REORGANIZE PARTITION.

3.5. Checking partition usage and statistics

To ensure MySQL is using partitions correctly, check partition metadata and validate partition pruning.

Check partition metadata

SELECT
  table_schema,
  table_name,
  partition_name,
  partition_method,
  partition_expression,
  partition_description,
  table_rows
FROM information_schema.partitions
WHERE table_schema = DATABASE()
  AND table_name = 'sales'
ORDER BY partition_ordinal_position;

What it shows: partition names, boundary or description, and approximate row counts.

Check query pruning (what partitions will be scanned)

EXPLAIN PARTITIONS
SELECT *
FROM sales
WHERE sale_date >= '2023-01-01'
  AND sale_date <  '2024-01-01';

Look at the partitions column:

  • If only the expected partitions appear, pruning is working.
  • If many or all partitions appear, the predicate might not be SARGable (search-argument-able) for the partition expression.

Common pruning pitfalls:

  • Predicates that wrap the partition key in a function (for example, YEAR(sale_date) = 2023).
  • Predicates that do not include the partition key at all (for example, WHERE amount > 1000).

3.6 Summary of partition management

OperationCommandNotes
Add a partitionALTER TABLE ... ADD PARTITIONOnly for Range & List Partitioning
Drop a partitionALTER TABLE ... DROP PARTITIONFaster than DELETE, irreversible
Merge / split partitionsALTER TABLE ... REORGANIZE PARTITIONMoves rows into new partition(s); plan for impact
Check partitionsSELECT * FROM information_schema.partitionsView partition metadata
Check pruningEXPLAIN PARTITIONS SELECT ...Ensure queries scan fewer partitions

4. Conclusion / Wrap-up

Partitioning can be a strong option for very large tables when it matches your workload.

Key takeaway: In MySQL, partitioning is mainly a tool for (1) data lifecycle management (drop old data quickly) and (2) query performance via partition pruning (when queries filter by the partition key).

Before shipping partitioning changes to production:

  • Benchmark with representative queries.
  • Validate changes in staging.
  • Confirm pruning with EXPLAIN PARTITIONS.

More like this

Playwright tips and tricks
Feb 02, 2024

Playwright tips and tricks

Code Quality in Python
Nov 17, 2023

Code Quality in Python