Log in

Pete Zaitcev's Journal [entries|friends|calendar]
Pete Zaitcev

[ userinfo | livejournal userinfo ]
[ calendar | livejournal calendar ]

Go go go [15 Jun 2016|08:57am]

Can you program in Go without knowing a thing about it? Why, yes. A barrier to entry, where are you?

[link] post comment

You are in a maze of twisted little directories, all alike [07 Jun 2016|03:21pm]

[root@rhev-a24c-01 go]# make get
go get -t ./...
go install: no install location for directory /root/hail/swift-go/go/bench outside GOPATH
For more details see: go help gopath
[root@rhev-a24c-01 go]# pwd
[root@rhev-a24c-01 go]# ls -l /root/go/src/github.com/openstack/swift
lrwxrwxrwx. 1 root root 25 Jun 6 21:50 /root/go/src/github.com/openstack/swift -> ../../../../hail/swift-go
[root@rhev-a24c-01 go]# cd /root/go/src/github.com/openstack/swift/go
[root@rhev-a24c-01 go]# pwd
[root@rhev-a24c-01 go]# make get
go get -t ./...
[root@rhev-a24c-01 go]#
[link] post comment

Encrypt everything? Please reconsider. [29 May 2016|11:16am]

Somehow it became fashionable among site admins to set it up so that accessing over HTTP was immediately redirected to https://. But doing that adds new ways to fail, such as certificate expired:

Notice that Firefox provides no way to ignore the problem and access the website (which was supposed to be accessible over HTTP to begin with). Solution? Use Chrome, which does:

Or, disable NTP and change your PC's clock two days back (be careful with running make while doing so).

This was discussed by CKS previously (of course), and he seems to think that the benefits outweigh the downsides of an occasional fuck-up, like the only website in the world that has the information that I want right now is suddenly unavailable with no recourse.

UPDATE: Chris dicussed the problem some more and brought up other examples, such as outdated KVM appliances that use obsolete ciphers.

One thing I'm wondering about is if a redirect from http:// to https:// makes a lot of sense. If you do not support access by plain HTTP, why not return ECONNREFUSED? I'm sure it's an extremely naiive idea.

[link] 1 comment|post comment

Russian Joke [25 May 2016|01:15pm]

In a quik translation from Bash:

XXX: Still writing that profiler?
YYY: Naah, reading books now
XXX: Like what books?
YYY: TCP Illustrated, Understanding the Linux kernel, Linux kernel development
XXX: And I read "The Never-ending Path Of Hatred".
YYY: That's about Node.js, right?

[link] post comment

Dell, why u no VPNC [21 May 2016|09:46pm]

Yo, we heard you liked remote desktops, so we put remote desktop into a remote desktop, now you can remote desktop while remote desktop.

I remember how IBM simply put VPNC interface in their Bladecenter. It was so nice. Unfortunately, vendors never want to be too nice to users, so their next release switched to a Java applet. Dell copied their approach for DRAC5. In theory, this should be future-proof, all hail WORA. In practice, it only worked with a specific version of Java, which was current when Dell shipped R905 ten years ago. You know, back then Windows XP was new and hot.

Fortunately, by a magic of KVM, libvirt, and Qemu, it's possible to create a virtual machine, install Fedora 10 on it, and then run Firefox with the stupid Java applet on it. Also, the Firefox and Java have to run in 32-bit mode.

When I did it for the first time, I ran Firefox through X11 redirection. That was quite inconvenient: I had to stop the Firefox running on the host desktop, because one cannot run 2 firefoxes painting to the same $DISPLAY. The reason that happens is, well, Mozilla Foundation is evil, basically. The remote Firefox finds the running Firefox through X11 properties and then some crapmagic happens and everything crashes and burns. So, it's much easier just to hook to the VM with Vinagre and run Firefox with DISPLAY=:0 in there.

Those old Fedoras were so nice, BTW. Funnily enough, that VM with 1 CPU and 1.5 GB starts quicker than the host laptop with the benefit of SystemD and its ability to run tasks in parallel. Of course, the handing of WiFi in Fedora 20+ is light years ahead of nm-applet in Fedora 10. There was some less noticeable progress elsewhere as well. But in the same time, the bloat was phenomenal.

UPDATE: Java does not work. Running the JNLP simply fails after downloading the applets, without any error messages. To set the plugin type of "native", ssh to DRAC, then "racadm config -g cfgRacTuning -o cfgRacTunePluginType 0". No kidding.

[link] 2 comments|post comment

Dropbox lifts the kimono [06 May 2016|02:38pm]

Dropbox posted somewhat of a whitepaper about their exabyte storage system, which exceeds the largest Swift cluster by about 2 orders of magnitude. Here's a couple of fun quotes:

The Block Index is a giant sharded MySQL cluster, fronted by an RPC service layer, plus a lot of tooling for database operations and reliability. We’d originally planned on building a dedicated key-value store for this purpose but MySQL turned out to be more than capable.

Kinda like SQLite in Swift.

Cells are self-contained logical storage clusters that store around 50PB of raw data.

And they have dozens of those. Their cell has a master node BTW. Kinda like Ceph's PG, but unlike Swift.


[link] post comment

OpenStack Swift Proxy-FS by SwiftStack [28 Apr 2016|10:42am]

SwiftStack's Joe Arnold and John Dickinson chose the Austin Summit and a low-key, #vBrownBag venue, to come out of closet with PROXY-FS (also spelled as ProxyFS), a tightly integrated addition to OpenStack Swift, which provides a POSIX-ish filesystem access to a Swift cluster.

Proxy-FS is basically a peer to a less known feature of Ceph Rados Gateway that permits accessing it over NFS. Both of them are fundamentally different from e.g. Swift-on-file in that the data is kept in Swift or Ceph, instead of a general filesystem.

The object layout is natural in that it takes advantage of SLO by creating a log-structured, manifested object. This way the in-place updates are handled, including appends. Yes, you can create a manifest with a billion of 1-byte objects just by invoking write(2). So, don't do that.

In response to my question, Joe promised to open the source, although we don't know when.

Another question dealt with the performance expectations. The small I/O performance of Proxy-FS is not going to be great in comparison to a traditional NFS filer. One of its key features is relative transparency: there is no cache involved and every application request goes straight to Swift. This helps to adhere to the principle of the least surprise, as well as achieve scalability for which Swift is famous. There is no need for any upcalls/cross-calls from the Swift Proxy into Proxy-FS that invalidate the cache, because there's no cache. But it has to be understood that Proxy-FS, as well as NFS mode in RGW, are not intended to compete with Netapp.

Not directly, anyway. But what they could do is to disrupt, in Christensen's sense. His disruption examples were defined as technologies that are markedly inferior to incumbents, as well as dramatically cheaper. Swift and Ceph are both: the filesystem performance sucks balls and the price per terabyte is 1/10th of NetApp (this statement was not evaluated by the Food and Drug Administration). If new applications come about that make use of these properties... You know the script.

[link] post comment

Amateur contributors to OpenStack [06 Apr 2016|10:44am]

John was venting about our complicated contribution process in OpenStack and threw this off-hand remark:

I'm sure the fact that nearly 100% of @openstack contributors are paid to be so is completely unrelated. #eyeroll

While I share his frustration, one thing he may be missing is that OpenStack is generally useless to anyone who does not have thousands of computers dedicated to it. This is a significant barrier of entry for hobbyists, baked straight into the nature of OpenStack.

Exceptions that we see are basically people building little pseudo-clusters out of a dozen of VMs. They do it with an aim of advancing their careers.

[link] 1 comment|post comment

SwiftStack versus Swift [30 Mar 2016|01:35pm]

Erik Pounds posted an article on SwiftStack official blog where it presented a somewhat corporatist view of Swift and Ceph that comes down to this:

They are both productized by commercial companies so all enterprises can utilize them... Ceph via RedHat and Swift via SwiftStack.

This view is extremely reductionist along a couple of avenues.

First, it tries to sweep all of Swift under SwiftStack umbrella, whereas in reality it derives a lot of strength from not being controlled by SwiftStack. But way to assuage the fears of monoentity control by employing the PTL, guys. Fortunately, in the real and more complex world, Red Hat pays me to work on Swift, as well as offer Swift as a product, and I do not find that my needs are in any way sabotaged by PTL. Certainly our product focus differs; Red Hat's lifetime management offering, OSPd, manages OpenStack first and Swift as much as it's a part of OpenStack, whereas SwiftStack offer a Swift-specific product. Still it's not like Swift equals SwiftStack. I think RackSpace continue to operate the largest Swift cluster in the world.

Second, Erik somehow neglects to notice that Ceph provides a Swift compatibility through a component known as Rados Gateway. It is an option, you know, although obviously it can never be a better Swift than Swift itself, or better Amazon S3 than S3 itself.

[link] post comment

The "fast-post" is merged into Swift [07 Mar 2016|12:51pm]

I'm just back from a hackathon at HPE site in Bristol, England, where we made a final look to so-called "fast-post" patch and merged it in. It was developed by Alistair Coles and basically made POST work as everyone expected it to work, at last.

In the original Swift, it was found that when you do a POST to an object, in the presence of failures it was possible to end with some nodes having old data but new (posted) attributes. The bad part was that the replication mechanism could not do anything to recoincile the inconsistency, and then your GET returns varying data forever depending on what node you hit. It occured when new timestamp from POST attached itself to old data (and other equivalent scenarios).

This is some of a fundamental issue with using a timestamp based replication in Swift. Greg and Chuck knew about it all along, and their solution was known as "POST to PUT". They made Swift Proxy to fetch the object, then update its attributes for the POST, then do essentially a PUT. This way timestamps, data, and attributes are always consistent, as they are in the initial PUT. If this POST-to-PUT thing occurs across a failure, replication uses timestamps to restore consistently correctly.

The problem with that, POST-to-PUT is slow, as well as deceptive. Users think they issue a lightweight POST, but actually they prompt a massive data move inside the cluster if the object is big.

Alasdair's insight was that the root of the problem was not that timestamps were no good as a basic mechanism, but that the "fast" POST broke them by assigning new timestamps to old data (or old attributes, metadata, Content-Type). As long as each indepentently settable thing had its own timestamp, there was no problem. In Swift, we have 3 of those: object data, object metadata, and Content-Type (don't ask). So, store 3 timestamps with each object and presto!

The actual patch employs an additional trick by not changing the container DB schema. Instead, it encodes 3 timestamps into 1 field where the timestamp used to live. This way a smooth migration is possible in a cluster where old async pendings still float, for example. It looks a little kludgy at first, but I convinced myself that it made sense under the circumstances.

P.S. The fast-post is not a default, even now. Needs Container Sync updated to be compatible. I think Eran was going to look into that.

[link] post comment

Ceph needs a build revolution [26 Feb 2016|04:05pm]

I've been poking at this Ceph thing since June, or 9 months now, and I feel like I'm overdue for a good rant. Today's topic is, Ceph's build system is absolutely insufferable. It's unbelievably fragile, and people break it every week at the most. Then it takes a week to fix, or even worse: it only breaks for me, but not for others, and then it stays broken, because I cannot possibly wade into the swamp this deep to fix it.

Here's today's post-10.0.3 trunk:

[zaitcev@lembas ceph-tip]$ sh autogen.sh 
[zaitcev@lembas ceph-tip]$ ./configure --prefix=$(pwd)/build --with-radosgw
[zaitcev@lembas ceph-tip]$ make -j3
  CXX      common/PluginRegistry.lo
  CXXLD    libcommon_crc.la
ar: `u' modifier ignored since `D' is the default (see `U')
ar: common/.libs/libcommon_crc_la-crc32c_intel_fast_asm.o: No such file or direc
Makefile:13137: recipe for target 'libcommon_crc.la' failed
[zaitcev@lembas ceph-tip]$ find . -name '*crc32c_intel*'
[zaitcev@lembas ceph-tip]$ 

Sometimes these things fix themselves after a fresh clone/autogen.sh/configure/make. But doing so all the time is prohibited by how long Ceph builds. Literally it takes many hours (depending if you use autotools or Cmake, and how parallel your build is). I bought a 4-core laptop with 16 GB and SSD just for that. A $1,200 later, I only have to wait 4 hours. Yay, I can build Ceph 2 times in 1 day.

The situation is completely insane, and it remained so for the months I spent working on this. The worst is that I don't understand how people even deal with this without killing themselves. If you look at the pull requests, obviously a large number of developers manage to build this thing somehow... unless all of them post untested patches all the time.

UPDATE: Waiting a bit and a fresh clone made the build to complete, but then:

make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/q/zaitcev/ceph/ceph-tip/selinux'
[zaitcev@lembas ceph-tip]$ echo $?
[zaitcev@lembas ceph-tip]$ ./src/vstart.sh -n -d -r -i
ls: cannot access compressor/*/: No such file or directory
** going verbose **
./src/vstart.sh: line 374: ./init-ceph: No such file or directory
[zaitcev@lembas ceph-tip]$

We are about to freeze Jewel with this codebase.

[link] post comment

git rebase - proceed with caution [25 Feb 2016|01:09pm]

In the age of Github, we're not supposed to do the "git merge" anymore, but a "git rebase" instead. Everyone knows that. However, the rebase has its quirks. One stepped on goes like this:

  1. Have 2 patches: A and B, on top of a tree T. Submit A upstream.
  2. Upstream merges A.
  3. Do "git rebase" in T. You'd think A would disappear and you get to keep B on top of T (or T'=T+A).
  4. But instead, git finds some conflicts and throws you a 3-way merge node, which contains your A, upstream A, and some small, unrelated conflict. It has an empty commit message, obviously.
  5. Resolve the merge, "git commit -a", "git rebase --continue", and you end with next commit being empty.
  6. If at this point you think that it's the former A and do "git rebase --skip", then — hold onto your chair — B is thrown into a conflict as well, and its commit message is attached to a follow-on empty commit too, just like A was. If you skip that one, you lose the commit message forever. There's no reset you could do at that point.

Well, if you pushed on a branch, you could go back to Github and salvage the message from there.

Anyhow, rebase takes a certain care. You can't assume that it always works, or assume that you could return to the previous state of repository. In this sense it's fundamentally different from the merge, where you can always do "git reset", no matter how much you screwed up.

[link] 3 comments|post comment

Ha [22 Feb 2016|05:51pm]

Seen today (re. Linux Mint intrustion):

First, I believe that Linux Mint will come out of this stronger than ever. Second, this will force others to take ISO security more seriously. This also provides end users with a stronger reason to pay closer attention to what they’re doing.

Really, that's what he said.

[link] post comment

A Short History Of Removable Media Behind The Iron Curtain [09 Feb 2016|12:47pm]

CKS blogged something to today that caught my eye, in the context of filesystems defined in host byte order:

In the beginning, storage was close to 100% system specific. Not only did you not think of moving a disk from a Vax to a Sun, you probably couldn't; the entire peripheral interconnect system was almost always different, from the disk to host cabling to the kind of backplane that the controller boards plugged into.

Although the above could be the case in the West, in USSR it was quite common to move removable media between systems, even back in the 1970s. The cartridge of choice was the disk pack for IBM 2311, a 7.25MB, 11" hard disk stack of 4 or 5 platters. It could be easily transported between BESM-6, IBM/370 clones (ES-1022 and up), PDP-11 clones (SM-3, SM-4, SM-1420, etc.), HP 2000 clones (SM-2), and Mitra-15 (ES-1010). It only went out of service by about 1986, with the adoption of 29MB disk packs.

Granted, UNIX only worked on PDP-11. It reached ES series too late for the 7.25MB packs, in the ES-1045 generation. However, SM-4 brought an "RP-5" cartridge with a single platter, which for a while was a gold standard for minicomputers. In Russian practice, it used a half-density recording for 2.5MB instead of 5MB in western cartridges. The "RP" ("rk" in UNIX) drive was hooked to a wide variety of mini- and micro-computers during their brief popularity before they were supplanted by PCs. Aside from the original SM and smaller LSI-11 compatibles (DVK), it was connected to Iskra-226 (Wang micro), Videoton's Z80-based TRS-80, Mitra-225, and basically any and every microcomputer, mostly based on the 8080 clone KR580IK80. The most popular format, I suspect, was a trivial filesystem used by DEC's RT-11 OS family.

When PCs came about, it was rather common to move around their hard drives with MFM interface, although Russian domestic winchesters required extreme care. Unplugging one with unparked heads could easily scratch the surface. Soon, the imports flooded the market and the 20MB Seagate ST-225 became the gold standard. It lasted until the single-cable IDE replaced it. Interestingly enough, the last hold-overs of LSI-11 line used various trivial IDE controllers with CPU-controlled access. You could attach something like ST-157 to them.

Chris was building an argument that the lack of portable disk media was the reason why everyone made their Berkeley filesystem in host byte order. He's not necessarily incorrect about that. Even if Russians used cartridge and winchester drives pervasively, nobody cared about them and they were not setting the software standards.

BTW, since we're on topic, in case of Linux, the order independence was not easily won. The original port to Amiga (by Geert, IIRC), used a big-endian ext or ext2, if not minix even. When SPARC port came about, DaveM was not sure if to follow Geert at first (again, IIRC). Someone made an argument that byte-swapping would waste CPU, and it carried some weight. Also, adding macros into all the correct positions were challenging. I remember arguing for order independence, in part because I did have an easy access to PCs with SCSI HBAs. Not that anyone listened to my opinion, but eventually DaveM and Tytso went with it as well, and the rest is history.

[link] post comment

HP Reconfigurable [15 Dec 2015|11:36am]

I learned by way of Mirantis today that an entity known as "HP Enterprise" or "HPE" introduced something described thus:

It’s an architecture in which a large server acts as a “pool” of compute, storage, and networking resources, the same way a cloud might. When an application needs resources, they’re allocated from that hardware pool, and when the application goes away, they’re returned from the pool. All of this happens via the composable architecture.

That may explain the mysterious Intel computer that I saw in Tokyo. So it's not quite NUMA taken to extremes, it's also hardware domains taken to extremes.

[link] post comment

Cool hardware in Tokyo [04 Nov 2015|07:46pm]

At the Mitaka Summit, we finally got some interesting kit exhibited, after the relatively lean summits in Atlanta and Vancouver. Unfortunately, the lightning in the Marketplace was very weird and pictures came out poorly.

My personal favourite is probably the flash array by SanDisk. It's nothing but JBOF, the host connection is SAS. You'd think any idiot could slap a few flash chips on cards and plug them into backplane... But just look how elegant it is. The capacity of the 2U box is 512 TB, but the whole thing only consumes 700 W maximum. It's brilliant, really.

Unfortunately, I don't have a good picture, but the second best was Ericksson's passive optical backplane. It promises to make your cables last forever: just swap out optronics when new bit rates come along. Even a terabit! Now it may actually be a misguided product. If they cannot get 3rd party vendors to build modules for it, the whole things comes crashing to the ground. Ditto if they build, but overprice. But the audacity of making something that's different is to be acknowledged. And frankly I'm not a fan of re-cabling when new servers come about.

Intel wins a consolation prize for preservance. They quietly presented some kind of next-generation multiblock computer, with pieces connected by serial cables. Finally, the future dreamed by the creators of Infiniband is here - only 15 years late, and still we don't know if it is viable.

There was also a bunch of fairly mundane boxes. Various also-run flash vendors were present, of course. Interestingly, SolidFire had a booth, but without anything eye-catching. Resting on the laurels? IBM brought their newest PowerPC, which was mostly remarkable for still existing. That sort of thing.

[link] post comment

Darcy on the future of storage [27 Oct 2015|07:41pm]

Quick comment on the following:

Good morning, madam. What kind of storage system would you like me to build for you today?

Scary thought. That means that selling storage products is going to be hard for all of us. We'll be selling components, both hardware and software, or we'll be selling integration and support services. Somebody will always pay to have somebody else assemble the parts, maybe add some light customization, and support the result. There's a nice living to be made there... but no empires.

Why is it a problem that no empires are to be built? It's only a problem for an empire-builder like I dunno... Sam Altman or something. Darcy is an old engineer, not a startup founder. A good one, too. His kids aren't going to go to bed hungry.

We've been at this dance before with Linux. People have been asking if Red Hat was going to be like Microsoft, and I told everyone: nope. We're transfering the wealth that the proprietary lock-in vendors were collecting back to the users. That was the whole idea. In the process, we're collecting less - a more reasonable amount, necessary to put stuff together and make it run. Therefore, we're not going to be as wealthy off users' backs. But the society as a whole benefits.

So cry me a river. Not scary at all. But RTWT, I think he's drawing a truthful outline overall.

P.S. Another thing, what's magical about storage? Why, I can go build spacecraft when storage goes bust. Or whatever. Of course it's a pity for all the storage-specific techniques and skills that I accumulated, but eh. As long as we leave behind the good code (and docs), it's all good.

[link] 1 comment|post comment

Pics Up [05 Oct 2015|01:31pm]

Чёт я под настроение выложил картинки с этой недели на форумы Авиабазы. Anglophones are welcome to pictures at least.

[link] 1 comment|post comment

TLS Security In Firefox 40 [11 Sep 2015|12:33pm]

What do people at Mozilla think is going to happen when I need to access a website and Firefox says that TLS parameters are insecure and thus I cannot? I'm going to use Chrome, that's what. Or maybe even a hacked Midori, where I can adjust build-time parameters of gcr.

That company went way downhill when they kicked Eich out.

[link] 6 comments|post comment

Tablet Uber Alles Or Is It [14 Aug 2015|03:38pm]

Given the trouble with modern laptops, I'm seriously thinking if I should make a jump to a gigantic tablet with a keyboard. You run "make" on VM. Not enough RAM? Order in the cloud! The idea was planted in my mind by that jerk Atwood, who penned an article claiming a death of PC. And a month ago I saw someone at Python meetup using Canopy. It kinda worked, actually. I expect Github Atom to be even better.

Unfortunately, there are problems in 3 broad categories still.

First, the hotspot Internet connectivity sucks. It is plain unreliable. VPN, ssh, and IRC are often blocked; it's necessary to remember "Connectivity Through Anything" lessons and tehcniques. When it works, it's often slow. These problems extend to venues such as Intel's Executive Briefing Center. If "executives" eating their awesome snacks cannot obtain a decent WiFi, what hope do I have? I do not have cellphone data, but I hear bitching about it.

Second, the usual questions about privacy and security apply. Non-proprietary tablets suck immensely, from what I heard.

Third, tablets top out at 10..11 inch. Sorry, but that is not enough to kill laptops while laptops continue to be made. Certainly, Atwood made an argument that as tablets absorb users, PC makers will stop. The day the last one quits, we'll have to use the least shitty tablet regardless of size. But today is not that day.

UPDATE: 3 weeks after this post, Apple unveiled a 12.9" (2732 x 2048) iPad Pro, with a keyboard as a factory option.

[link] post comment

[ viewing | most recent entries ]
[ go | earlier ]