MinIO posted a blog entry a few days ago where the bragged about adding capacity without a need to re-balance.
First, they went into a full marketoid mode, whipping up the fear:
Rebalancing a massive distributed storage system can be a nightmare. There’s nothing worse than adding a storage node and watching helplessly as user response time increases while the system taxes its own resources rebalancing to include the new node.
Seems like MinIO folks assume that operators of distributed storage such as Swift and Ceph have no tools to regulate the resource consumption of rebalancing. So they have no choice but to "wait helplessly". Very funny.
But it gets worse when obviously senseless statements are made:
Rebalancing doesn’t just affect performance - moving many objects between many nodes across a network can be risky. Devices and components fail and that often leads to data loss or corruption.
Often, man! Also, a commit protocol? Never heard of her!
Then, we talk about some unrelated matters:
A group of drives is an erasure set and MinIO uses a Reed-Solomon algorithm to split objects into data and parity blocks based on the size of the erasure set and then uniformly distributes them across all of the drives in the erasure such that each drive in the set contains no more than one block per object.
Understood, your erasure set is what we call "partition" in Swift or a placement group in Ceph.
Finally, we get to the matter at hand:
To enable rapid growth, MinIO scales by adding server pools and erasure sets. If we had built MinIO to allow you to add a drive or even a single hardware node to an existing server pool, then you would have to suffer through rebalancing.
MinIO scales up quickly by adding server pools, each an independent set of compute, network and storage resources.
Add hardware, run MinIO server to create and name server processes, then update MinIO with the name of the new server pool. MinIO leaves existing data in their original server pools while exposing the new server pools to incoming data.
My hot take on the social media was: "Placing new sets on new storage impacts utilization and risks hotspotting because of time affinity. There's no free lunch." Even on the second thought, I think that is about right. But let us not ignore the cost of the data movement associated with rebalancing. What if the operator wants to implement in Swift what MinIO blog post talks about?
It is possible to emulate MinIO, to an extent. Some operators add a new storage policy when they expand the cluster, configure all the new nodes and/or volumes in its ring, then make it default, so newly-created objects end on the new hardware. This works to accomplish the same goals that MinIO outline above, but it's a kludge. Swift was not intended for this originally and it shows. In particular, storage policies were intended for low numbers of storage classes, such as rotating media and SSD, or Silver/Gold/Platinum. Once you make a new policy for each new forklift visit, you run a risk of finding scalability issues. Well, most clusters only upgrade a few times over their lifetime, but potentially it's a problem. Also, policies are customer visible, they are intended to be.
In the end, I still think that balanced cluster is the way to go. Just think rationally about it.
Interestingly, the reverse emulation appears to be not possible for MinIO: if you wanted to rebalance your storage, you would not be able to. Or at least the blog post above says: "If we had built MinIO to allow you to add a drive or ... a node to an existing server pool". I take it to mean that they don't allow, and the blog post is very much a case of sour grapes, then.