A small billion-object Swift cluster

In the latest of Swift numbers: talked to someone today who mentioned that they have 1,025,311,000 objects, or almost exactly a billion. They are spread over only 480 disks. That is, if my arithmetic is correct, 2,000 times smaller than Amazon S3 was in 2013. But hey, not everyone is S3. And they aren't having any particular problems, things just work.

~avg on NoSQL

Just saving it from LinkedIn:

The real difference between SQL-based (and other relational databases) and NoSQL glorified KV stores is the presence of algebraic structure (i.e. Codd algebra). Algebra is basically all about transformations between equivalent expressions to arrive to a desireable form (i.e. simplified, or factorized, or whatever the goal is). These transformations have another name: optimizations.

Basically, when you have a real SQL database, you have ability to optimize execution plans. Which could easily yield orders of magnitude of improvement in performance.

(And, yes, modern relational databases (i.e. Snowflake) do internally convert semi-structured data into tabular form so that the optimizations are applicable to these as well).

If I had something to say about this, it would be something about stable, dependable performance having a value of its own. That is why TokyoCabinet was such a revelation and prompted the NoSQL revolution, which later ended with Mongo and reaction, like any revolution. But this is not my field, so let's just save it for future reference.

Google outage

It's very funny to hear about people who were unable to turn on their lights because their houses were "smart". Not a good look for Google Nest! But I had a real problem:

Google outage crashed my Thunderbird so good that the only fix is to delete the ~/.thunderbird and re-add all accounts.

Yes, really.

Cries of the vanquished

The post at roguelazer's is so juicy from every side that I'd need to quote it whole to give it justice (h/t ~avg). But its ostensible meat is etcd.[1] In that, he's building a narative of the package being elegant at first, and bloating later.

This tool was originally written in 2013 for a ... project called CoreOS. ... etcd was greater than its original use-case. Etcd provided a convenient and simple set of primitives (set a key, get a key, set-only-if-unchanged, watch-for-changes) with a drop-dead simple HTTP API on top of them.

Kubernetes was quickly changed to use etcd as its state store. Thus began the rapid decline of etcd.

... a large number of Xooglers who decided to infect etcd with Google technologies .... Etcd's simple HTTP API was replaced by a "gRPC" version; the simple internal data model was replaced by a dense and non-orthogonal data model with different types for leases, locks, transactions, and plain-old-keys.

Completely omitted from this tale is that etcd was created as a clone of Google Chumby, which did not use HTTP. The HTTP interface was implemented in etcd for expediency. So, the nostalgic image of early etcd he's projecting is in fact a primitive early draft.

It's interesting that he only mentions leases and locks in passing, painting them as a late addition, whereas the concept of coarse locking was more important for Chumby than the registry.

[1] Other matters are taken upon in the footnotes, at length. You'd think that it would be a simple matter to create a seaprate post to decry the evils of HTTP/2, but not for this guy! I may write another entry on the evils of bloat and how sympathetic I am to his cause later.

Recruiter spam

Recruitment spam, like conference spam, is a boring part of life. However, it raises an eyebrow sometimes.

A few days ago, a Facebook recruiter, John-Paul "JP" Fenn, sent me a form e-mail to an address that I do not give to anyone. It is only visible as a contact for one of my domains, because the registrar does not believe in privacy. I was pondering if I should propose to give him a consideration in exchange for the explanation of just where he obtained the address. Purely out of curiosity.

Today, an Amazon recruiter, Jonte, sent a message to an appropriate address. But he did it with addresses in the To: header, not just the envelope. He used a hosted Exchange of all things, and there were 294 addresses in total. That should give you an idea just how hard these people work to spam and at what level of being disposable I am in their eyes.

It really is pure spam. I think it's likely that JP bought a spam database. He didn't write a Python script that scraped whois information.

I remember a viral story a few years ago how one guy got a message from Google recruiter that combined his LinkedIn interests in amusing ways. It went like this: "We seek people whose strength is Talking Like A Pirate. As for Telling Strangers On The Internet They Were Wrong, that's one of my favorite pastimes as well." You know you made it when you receive that kind of attention. Maybe one day!

UPDATE 2020-03-02: JP let me go, but 3 more Facebook recruiters attacked me: Faith, Sara, Sandy. I really didn't want to validate their spam database, but I asked their system to unsubscribe. They didn't make it simple though. First, I had to follow a link in recruiter's spam. It didn't have a cookie though, but pointed to a stock URL with a form to enter the e-mail address. Then, Facebook sent a message to the given address (thus fully validating the address they harvested). That e-mail contained a link with the desired cookie. When I followed that, I was asked to confirm the unsubscription. Only then, they promised to unsubcribe me.

Seagate and SMR in 2020

Back in 2015, I wrote about Seagate Kinetic and its relation to shingles in Seagate product. Unfortunately, even if Kinetic were a success, it would only support a fraction of workloads. But the rest of Seagate customers demanded density increases. So, to nobody's surprise, Seagate started including shingles into their general purpose disk drives, perhaps only for a part of the surface, or coupled with a flash cache. The company was an enthusiastic early adopter of hybrid drives, as a vendor. Journalists are trying to make a story out of it, because caches are only caches, and once you started spilling, the drive slows down to the shingle speed. But naturally, Seagate neglected to mention in their documentation just how exactly their drive worked. Sacre bleu!

Another perspective on Swift versus Ceph today

Seen in e-mail today:

From: Mark Kirkwood

There are a number of considerations (disclaimer we run Ceph block and Swift object storage):

Purely on a level of simplicity, Swift is easier to set up.

However, if you are already using Ceph for block storage then it makes sense to keep using it for object too (since you are likely to be expert at Ceph at this point).

On the other hand, if you have multiple Ceph clusters and want a geo replicated object storage solution, then doing this with Swift is much easier than with Ceph (geo replicated RGW still looks to be real complex to set up - a long page of archane commands).

Finally (this is my 'big deal point'). I'd like my block and object storage to be completely independent - suppose a situation nukes my block storage (Ceph) - if my object storage is Swift then people's backups etc are still viable and when the Ceph cluster is rebuilt we can restore and continue. On the other hand If your object storage is Ceph too then....

regards

Mark

Mark's perspective is largely founded in the fault tolerance and administrative overhead. However, let's a look at "keep using [Ceph] for object too".

Indeed the integration of block, POSIX, and object storage is Ceph's strength, although I should note for the record that Ceph has a large gap: all 3 APIs live in separate namespaces. So, do not expect to be able to copy a disk snapshot through CephFS or RGW. Objects in each namespace are completely invisible to two others, and the only uniform access layer is RADOS. This is why, for instance, RGW-over-NFS exists. That's right, not CephFS, but NFS. You can mount RGW.

All attempts at this sort of integration that I know in Swift always start with a uniform access first. It the opposite of Ceph in a way. Because of that, these integrations typically access from the edge inside, like making a pool that a daemon fills/spills with Swift, and mounting that. SwiftStacks's ProxyFS is a little more native to Swift, but it starts off with a shared namespace too.

Previously: Swift is faster than any competitor, says an emploee of SwiftStack.

Nvidia acquires SwiftStack

In the words of Joe Arnold:

Last year, when we announced SwiftStack 7, we unveiled our focus on the SwiftStack Data Platform for AI, HPC, and accelerated computing. This included SwiftStack 1space as a valuable piece of the puzzle, enabling data acceleration in the core, at the edge, and in the cloud.

To our existing customers — we will continue to maintain, enhance, and support 1space, ProxyFS, Swift, and the Controller. SwiftStack’s technology is already a key part of NVIDIA’s GPU-powered AI infrastructure, and this acquisition will strengthen what we do for you.

Building AI supercomputers is exciting to the entire SwiftStack team. We couldn’t be more thrilled [...]

Highlighting 1space as the centerpiece of the acquisition seems strange. All I knew about it was a cloud-to-cloud data pumping service. Hardly any HPC stuff. I could see how Nvidia might want ProxyFS to replace Hadoop, but not this.

The core Swift continues unchanged for now.

Too Real

From CKS:

The first useful property Python has is that you can't misplace the source code for your deployed Python programs.