Or so they say, at least for certain workloads.
In January of 2015 I led a project to evaluate and select a next-generation storage platform that would serve as the central storage (sometimes referred to as an active archive or tier 2) for all workflows. We identified the following features as being key to the success of the platform:
- Utilization of erasure coding for data/failure protection (no RAID!)
- Open hardware and the ability to mix and match hardware (a.k.a. support heterogeneous environments)
- Open source core (preferred, but not required)
- Self-healing in response to failures (no manual processes required, like replacing a drive)
- Expandable online to exabyte-scale (no downtime for expansions or upgrades)
- High availability / fault tolerance (no single point of failure)
- Enterprise-grade support (24/7/365)
- Visibility (dashboards to depict load, errors, etc.)
- RESTful API access (S3/Swift)
- SMB/NFS access to the same data (preferred, but not required)
In hindsight, I wish we would have included two additional requirements:
- Transparently tier and migrate data to and from public cloud storage
- Span multiple geographic regions while maintaining a single global namespace
We spent the next ~1.5 years evaluating the following systems:
- Ceph (InkTank/RedHat/Canonical)
- Dell/EMC ECS
- Cleversafe / IBM COS
- HGST/WD ActiveScale
- NetApp StorageGRID
- Quantum Lattus
- QFS (Quantcast File System)
- AWS S3
- Sohonet FileStore
SwiftStack was the only solution that literally checked every box on our list of desired features, but that’s not the only reason we selected it over the competition.
The top three reasons behind our selection of SwiftStack were as follows:
- Speed – SwiftStack was—by far—the highest-performing object storage platform — capable of line speed and 2-4x faster than competitors. The ability to move assets between our “tier 1 NAS” and “tier 2 object” with extremely high throughput was paramount to the success of the architecture.
Note: While SwiftStack 1space was not a part of the SwiftStack platform at the time of our evaluation and purchase, it would have been an additional deciding factor in favor of SwiftStack if it had been.
Interesting. It should be noted that performance of Swift is a great match for some workloads, but not for others. In particluar, Swift is weak on small-file workloads, such as Gnocchi, which writes a ton of 16-byte objects again and again. The overhead is a killer there, and not just on the wire: Swift has to update its accounting databases each and every time a write is done, so that "swift stat" shows things like quotas. Swift is also not particularly good at HPC-style workloads, which benefit from a great bisectional bandwidth, because we transfer all user data through so-called "proxy" servers. Unlike e.g. Ceph, Swift keeps the cluster topology hidden from the client, while a Ceph client actually tracks the ring changes, placement groups and their leaders, etc.. But as we can see, once the object sizes start climbing and the number of clients increases, Swift rapidly approaches the wire speed.
I cannot help noticing that the architecture in question has a front-facing cache of pool (tier 1), which is what the ultimate clients see instead of Swift. Most of the time, Swift is selected for its ability to serve tens of thousands of clients simultaneously, but not in this case. Apparently, the end-user invented ProxyFS independently.
There's no mention of Red Hat selling Swift in the post. Either it was not part of the evaluation at all, or the author forgot about it for the passing of time. He did list a bunch of rather weird and obscure storage solutions though.