A good Table Partitioning strategy depends on knowing the relationships and access patterns of your data, then working out a partitioning strategy that will spread the data out to get the best maintenance capabilities and performance results.
Your company’s business is booming: new orders are piling up and new deals are closing every day. At the same time, your data is growing so fast that you wish your 401K account would do the same. As a database administrator, you are excited to see your database is finally in terabytes range. However, something unpleasant could also be happening. Maybe you have noticed that “index rebuild” is taking a whole day to complete instead of a few hours as in the past. Data archive is becoming more painful than ever. Worst of all, queries are running slower and your boss is not very happy about it.
So forget about your 401K, it’s time for action.
You consulted your friends and they told you about a new feature in Progress OpenEdge: Table Partitioning. You thought, “Yes, I’ve heard a lot about that. It’s a great feature. Basically it splits data into smaller chunks so I can divide-and-conquer. The best part is that Table Partitioning can also make queries faster! Wait a minute. Did you just say Progess is now supporting this in OpenEdge? That’s excellent and I’ve been ready to use this for a while!”
So you converted your data into partitions, which is pretty easy by the way, and you are happy to see maintenance is much easier and database maintenance down time is much shorter. Everything looks great until you start to receive phone calls complaining that performance is getting even worse!
What is going on? Shouldn’t I get better performance? A whitepaper recently published by Progress explains how this could happen: partitioning strategy is the key factor for success when using Table Partitioning. According to the test a good strategy can easily double the performance for operations like “CREATE” and “DELETE.” On the other hand, a poorly chosen strategy barely improves performance, and sometimes can even make things even slower.
The paper also concludes that an insightful strategy to improve performance depends on how evenly your data is distributed AND accessed. Note that a balanced data distribution alone won’t guarantee you good performance because your data access pattern may tell a different story. Using the “Order” table in the whitepaper as an example, the table can be partitioned by “order-date” to have a balanced distribution of sales data by each year, quarter, or even month. However, the partition with the most current “order-date” value will become a “hot” spot for record creates as all new records may have “order-date” fall into this partition. The solution is to alter this access pattern by adding another column “region” to spread out data even further. Now records with the same “order-date” may end up in different partitions because they come from a different “region.” Notice the trick here? Just add some spice and things get much better!
Using other columns (e.g. “product-code”) as the partitioning key may give you even better data distribution. But you may argue that partitioning by “order-date” suits your business model the best, because most reports or maintenance are based on this value instead of “product-code.” Here is another question: “How do I design Table Partitioning so that I can enjoy all the benefits people have mentioned?” For example, I want the best performance and the easiest maintenance. Well, there is no simple answer to this one. In some cases, you may have to trade off the benefit of data manageability or availability if your primary interest is in performance. But in many other cases, sub-partitioning can be a good choice to meet all the different needs. I suggest not having too many sub-partitions on the same table though, because you may end up with too many partitions to manage.
My advice? Get to know your data’s relationships and access patterns, then work out a partitioning strategy that will spread the data out to get the best maintenance capabilities and performance results.
Read the Whitepaper: Table Partitioning: Improve Performance through Increased Concurrency
For more information leave a message in the comment box below.
Dapeng Wu, with over 16 years of experience in data storage technologies, is a software architect in the OpenEdge database group. Prior to Progress, Dapeng also worked at Cisco Systems and IBM.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.