Database Partitions, The Powerful Tool That is Often Overlooked

When we hear the words “database design” as developers, there are a few things which spring to mind. What tables do I need? What are the relationships of those tables? What structure will make the code the easiest to write? Historically, I have been guilty of only caring about how the database will impact the way I code. It’s only when the database tables contain hundreds of thousands of records where this mistake reveals itself.

Poor Organisation is Time Wasting

If I use the following example, it sounds so obvious, and you wonder why it is ever over-looked. The national library of the United Kingdom has over 200 million items. Now imagine instead of categorising these books and storing them in alphabetical order, we instead just place them on a shelf in the order they were added to the library. Now if you want to find a book, you have to search the entire library for it. That’s over 200 million items you need to check, and you would probably just give up.

If the books were organised alphabetically and all the books starting with the letter A are dedicated to one area of the library, you may then only need to search through 8 million books. That’s still not great but it is far better than 200 million. You could then section this further. For example, you might section off authors or time periods.

Incorporating Partitions Into Database Design

The same logic applies to databases as it does libraries. To best understand how to get the most out of partitions, you have to look at how we use the data.

Can your table be categorised like products? Does the system frequently query the data by date?

In MySQL, you can partition with multiple methods. These include “KEY”, “RANGE”, “LIST” and “HASH”. I will not go into the detailed differences between these, but the general idea is that you can section off part of a table based on the value of one or more columns.

In a report system where users will often filter by month and year, it would be a good idea to create partitions based on the dates. Partitions are essentially copies of the same table, but all records must follow the rules of that partition. For example, you can have a table which only contains records from 2025. If a user wants to filter reports to 2025, the system will look in the partition labelled 2025 rather than search everything. Assuming more and more reports are added to the system every year, the time to pull the reports through will never increase as you are always looking through a small number of reports. If you did not partition it, the time to find the reports would increase year on year as the dataset never stops growing.

Re-Thinking How We Design Databases

As powerful as partitions are, you can’t just partition a table randomly and expect a big difference in speed. We need to design our tables and our code with these sections in mind. Following on from the previous example, if the tables are sectioned into years, but your queries never specify dates, the system then has to look in every partition as you have not told the system which partitions to look in. That just defeats the point!

We also have to bear in mind that it is harder to partition data on tables that span across relationships. This does not mean that we should not use relationships; We absolutely should, and the database normalisation standard was created for a reason. This just means that we have to be very deliberate with where we draw the relationships and not overdo it.

Takeaways

The moral of the story is that every decision we make when designing databases should have thought put behind it and we should never settle for “that will do”. It is a delicate balance between making a database that makes coding easier, a database that is fast and a database that is durable. How we put a database together defines the foundation of our apps. A well designed database that implements partitioning effectively will make the difference between a website people click off of because it’s too slow and a website that loads in less than a second. It also makes the difference between code that developers love working with and code that nobody wants to touch because the whole thing is waiting to crumble down like a Jenga tower.