?

Log in

Pete Zaitcev's Journal [entries|friends|calendar]
Pete Zaitcev

[ userinfo | livejournal userinfo ]
[ calendar | livejournal calendar ]

git-codereview [12 Jan 2017|12:55pm]

Not content with a legacy git-review, Google developed another Gerrit front-end, the git-coderevew. They use it for contributions to Go. I have to admit, that was a bit less of a special move than Facebook's git-review that uses the same name but does something entirely different.

P.S. There used to be a post about creating a truly distributed github, which used blockchain in order to vote on globally unique names. Can't find a link though.

[link] post comment

Mirantis and the business of OpenStack [08 Jan 2017|11:07am]

It seems that only in November we heard about massive layoffs at Mirantis, "The #1 Pure Play OpenStack Company" (per <title>). Now they are teaching us thus:

And what about companies like Mirantis adding Kubernetes and other container technologies to their slate? Is that a sign of the OpenStack Apocalypse?

In a word, “no”.

Gee, thanks. I'm sure they know what it's like.

[link] post comment

The idea of ARM has gone mainstream [27 Dec 2016|11:51am]

We still don't have any usable servers on which I could install Fedora and have it supported for more than 3 releases, but gamers already debate the merits of ARM. The idea of SPEC-per-Watt has completely gone mainstream, like Marxism.

<sage> http://www.bitsandchips.it/english/52-english-news/7854-rumor-even-intel-is-studying-a-new-x86-uarch new uarch? it's about time
<sage> they just can't make x86 as power efficient as arm
<JTFish> What is the point
<JTFish> it's not like ARM will replace x86 in real servers any time soon
<sage> what is "real" servers?
<JTFish> anything that does REAL WORLD shit
<sage> what is "real world"?
<JTFish> serving internet content etc
<JTFish> database servers
<JTFish> I dunno
<JTFish> mass encoding of files
<sage> lots of startups and established companies are already betting on ARM for their cloud server offerings
<sage> database and mass encoding, ok
<sage> what else
<JTFish> are you saying
<JTFish> i'm 2 to 1
<JTFish> for x86
<JTFish> also I should just go full retard and say minecraft servers
<sage> the power savings are big, if they can run part of their operation on ARM and make it financially viable, they will do it

QUICK UPDATE: In the linked article:

The next Intel uArch will be very similar to the approach used by AMD with Zen – perfect balance of power consumption/performance/price – but with a huge news: in order to save physical space (Smaller Die) and to improve the power consumption/performance ratio, Intel will throw away some old SIMD and old hardware remainders.

The 100% backward hardware x86 compatibility will not guaranteed anymore, but could be not a handicap (Some SIMD today are useless, and also we can use emulators or cloud systems). Nowadays a lot of software house have to develop code for ARM and for x86, but ARM is lacking useful SIMD. So, frequently, these software are a watered-down compromise.

Intel will be able to develop a thin and fast x86 uArch, and ICC will be able to optimize the code both for ARM and for x86 as well.

This new uArch will be ready in 2019-2020.

Curious. Well, as long as they don't go full Transmeta on us, it may be fine.

[link] post comment

FAA proposes to ban NavWorx [22 Oct 2016|10:26pm]

Seen a curious piece of news today. As a short preamble, an aircraft in the U.S. may receive useful information from a ground station (TIS-B and FIS-B), but it has to transmit a certain ADS-B packet for that to happen. And all ADS-B packets include a field that specifies the system's claim that it operates according to a certain level of precision and integrity. The idea is, roughly, if you detect that e.g. one of your redundant GPS receivers is off-line, you should broadcast that you're downgraded. The protocol field is called SIL. The maximum level you can claim is determined by how crazily redundant and paranoid your design is. We are talking something in the order of $20,000 worth of cost, most of which is amortization of FAA paperwork certifying and you are entitled to claim SIL of 2. I lied about this explanation being short, BTW.

So, apparently, NavWorks shipped cheap ADS-B boxes, which were made with a Raspberry Pie and a cellphone GPS chip (or such). They honestly transmitted a SIL of 0. Who cares, right? Well, FAA decided that TIS should stop to reply to airplanes flying around with a SIL Zero ADS-B boxes, because fuck the citizens, they should pay their $20k. Pilots called the NavWorks and complained that their iPads hooked to ADS600 do not display the weather reliably anymore. NavWorks issued a software update that programmed their boxes to transmit SIL of 2. No other change: the actual transmitted positions remained exactly as before, only the claimed reliability was faked. When FAA got the wind of this happening, they went nuclear on NavWorks users' asses. The proposed emergency directive orders owners to remove the offending equipment from their aircraft. They are grounded until the compliance.

Now the good thing is, the ADS-B mandate comes in 2020. They still have 3 years to find a more compliant (and expensive) supplier, before they are prohibited from a vicinity of a major city. So it's only money.

I don't have a dog in this fight, personally, so I can sympathize with both the bureaucrats who saw cheaters and threw a book at them, and the company that employed a workaround against a meaningless and capricious rule. However, here's a couple of observations.

First, note how FAA maintains a database of individual (not aggregate) protocol compliance for each ADS-B ID. They will even helpfully send you a report about what they know about you (it's intended so you can test the performance your ADS-B equipment). Imagine if the government saved every query that your browser made, and could tell if your Chrome were not compliant with a certain RFC. This detailed tracking of everything is actually very necessary because the protocol has no encryption whatsoever and is trivially spoofed. Nothing stops a bad actor to use your ID in ADS-B. The only recourse is for the government to investigate reported issues and find the culprit. And they need the absolute tracking for it.

Second, about the 2020 mandate. The airspace prohibition amounts to not letting someone into a city if the battery is flat in their EZ-pass transponder. Only in this case, the government sent you a letter saying that your transponder is banned, and you must buy a new one before you can get to work. In theory, your freedom of travel is not limited - you can take a bus. In practice though, not everyone has $20k, and the waiting list for the installer is 6 months.

UPDATE 2016/12/19: NavWorx posted the following explanation on their website (no permalink, idiots):

Our version 4.0.6 made our 12/13 products transmit SIL 3, which the FAA ground stations would recognize as sufficient to resume sending TIS-B traffic to our customers.

Fortunately from product inception our internal GPS met SIL 3 performance. The FAA approved our internal GPS as SIL 3. During the TSO certification process, the FAA accepted our “compliance matrix” – which is the FAA’s primary means of compliance - showing our internal GPS integrity was 1x10-7, which translates to SIL of 3. However, FAA policy at that time was that ADS-B GPS must have its own separate TSO – our internal GPS was certified under TSO-C154c, the same as the UAT OUT/IN transceiver. It’s important to note that the FAA authorized us to certify our internal GPS in this manner, and that they know that our internal GPS is safe – applicants for TSO certification must present a project plan and the FAA reviews and approves this project plan before the FAA ever allows an applicant to proceed with TSO certification of any product. Although they approved our internal GPS to be SIL of 3 (integrity of 1x10-7), based on FAA policy at the time they made us transmit SIL 0, with the explanation that “uncertified GPS must transmit SIL 0”. This really is a misnomer, as our GPS is “certified” (under TSO-C154c), but the FAA refers to it as “uncertified”. The FAA AD states that “uncertified” GPS must transmit SIL of 0.

So, basically, they never bothered to certify their GPS properly and used a fig leaf of TSO-C154c.

The letter then goes on how unfair it is that all the shitty experimentals are allowed to signal SIL 3 if only they use a proper GPS.

UPDATE 2016/12/20: AOPA weighs in a comment on NPRM:

Specifically, AOPA recommends the FAA address the confusion over whether the internal position source meets the applicable performance requirements, the existence of an unsafe condition, and why the proposed AD applies to NavWorx’s experimental UAT model.

The FAA requires a position source to meet the performance requirements in appendix B to AC 20-165B for the position source to be included in the ADS-B Out system and for an aircraft to meet the § 91.227(c) performance requirements (e.g., SIL = 3). The FAA does not require the position source be compliant with a specific TSO. Any person may demonstrate to the FAA that its new (uncertified) position source meets the requirements of appendix B to AC 20-165B, thereby qualifying that position source to be used in an ADS-B Out system. However, integrating a TSO-certified position source into a UAT means that a person will have fewer requirements to satisfy in AC 20-165B appendix B during the STC process for the ADS-B Out system.

Around May 2014, the FAA issued NavWorx an STC for its ADS600-B UAT with part numbers 200-0012 and 200-0013 (Certified UATs). The STC allowed for the installation of those UATs into any type-certificated aircraft identified in the approved model list. The Certified UATs were compliant with TSO-C154c, but had internal, non-compliant GPS receivers. (ADS600-B Installation Manual 240-0008-00-36 (IM -36), at 17, 21, 28.) Specifically, section 2.3 of NavWorx’s March 2015 installation manual states:

“For ADS600-B part numbers 200-0012 and 200-0013, the internal GPS WAAS receiver does not meet 14 CFR 91 FAA-2007-29305 for GPS position source. If the ADS600-B is configured to use the internal GPS as the position source the ADS-B messages transmitted by the unit reports: A Source Integrity Limit (SIL) of 0 indicating that the GPS position source does not meet the 14 CFR 91 FAA-2007-29305 rule.” (IM -36, at 19.)

Hoo, boy. Per the above quote by AOPA, NavWorks previously admitted in writing that their internal GPS is not good enough, but they are trying to walk that back with the talk about "GPS integrity 1x10-7".

In the same comment letter later, Justin T. Barkowski recommends to minimize the economic impact in the rulemaking and not force owners to pull NavWorx boxes out of the aircraft immediately.

[link] post comment

Russian Joke [01 Sep 2016|10:12pm]

Supposedly from Habrahabr.ru, via bash.org.ru:

Autor's Bio: Andrey Pan'gin [ref — zaitcev]. Programmer in the Odnoklassniki company, specializing in highly loaded back-ends. Knows JVM like the back of his hand, since he developed the HotSpot VM at Sun Microsystems and Oracle for several years. Loves assembly and systems programming.
A comment: Fallen angel.

[link] post comment

Curse you, Jon Masters! Why do you always have to be right! [24 Aug 2016|12:56pm]

My friend and colleague Jon is proud of his disdain for Linux on desktop (or tablet, for that matter), and always goes around telling people how OSX always works on a Mac, because Apple performs integrated testing etc. etc.. The latest episode involved him buying a lemon Dell XPS 13. Pretty much nothing worked right on that pitiful excuse for a computer, and I am sorry to admit, I felt a little smug telling Jon on Facebook how well my ASUS UX303LB worked under Fedora. I've not had a failure to resume even once in the years I had it (yeah, my standards are this low).

Long story short, Fedora 24 came out and I'm given the taste of the same medicine: the video on the ASUS is completely busted. I was able to limp along for now by using the old kernel 4.4.6-301.fc23, but come on, this is clearly a massive regression. Think anyone is there to bisect and find the culprit? Of course not. I have to do it it myself.

So, how did F24 ship? Well... I didn't test beta versions, so I don't have much ground to complain.

UPDATE: While upsream is working on the fix in the next release, I'm using i915.enable_psr=0.

[link] post comment

The sound ID of telemarketers [24 Aug 2016|11:16am]

I noticed one strange thing recently. Every telemarketing call starts with an particular sound that resembles a modulated data block. It's very short, about 250 ms, but very audible. I'm a little curious what it is. Is it possible to capture and decode?

The regular calls are not preceded by this block, so I'm certain that it's something that telemarketers mix in. But to what purpose?

UPDATE: Someone shared this post on Hacker News. I'm happy to report they didn't conclude that I'm making shit up, but the best the hive-mind came up with was that the Caller-ID is getting sent even after the receiver is picked up. I really should record this somehow.

[link] post comment

Fedora, Swift, and xattr>=0.4 [18 Aug 2016|06:23pm]

If one tries to run Swift tests with "PYTHONPATH=$(pwd) ./.unittests" on a stock Fedora, a bunch of them fail with "DistributionNotFound: xattr>=0.4". This is fixed easily with the following patch:

diff -urp pyxattr-0.5.1-p3/setup.py pyxattr-0.5.1/setup.py
--- pyxattr-0.5.1-p3/setup.py	2012-05-15 16:58:20.000000000 -0600
+++ pyxattr-0.5.1/setup.py	2014-05-29 14:21:54.223317477 -0600
@@ -29,3 +29,11 @@ setup(name = "pyxattr",
       test_suite = "test",
       platforms = ["Linux"],
       )
+# Add a dummy egg so "xattr>=0.4" works in requirements.txt for paste-deploy.
+# This primarily helps with running unit tests of Swift et.al., because for
+# packaging we already disable all this.
+setup(name="xattr",
+      version = version,
+      description = "Alias to pyxattr",
+      ext_modules = [Extension("xattr", [])]
+     )

IIRC I proposed this as a fix, but the maintainer of pyxattr in Fedora was not glad to see it, so I threw together a spec and RPM for pyxattr, kept in my people.redhat.com page.

This was going on for 3 years or more. Rebuilding the patched pyxattr again for Fedora 24, I started wondering idly, why is it that nobody else ran into this problem? I suspect the answer is that I am the only human in the world who tests OpenStack Swift on Fedora. Everyone else uses Ubuntu (or pip).

[link] post comment

Go go go [15 Jun 2016|08:57am]

Can you program in Go without knowing a thing about it? Why, yes. A barrier to entry, where are you?

[link] post comment

You are in a maze of twisted little directories, all alike [07 Jun 2016|03:21pm]

[root@rhev-a24c-01 go]# make get
go get -t ./...
go install: no install location for directory /root/hail/swift-go/go/bench outside GOPATH
For more details see: go help gopath
[root@rhev-a24c-01 go]# pwd
/root/hail/swift-go/go
[root@rhev-a24c-01 go]# ls -l /root/go/src/github.com/openstack/swift
lrwxrwxrwx. 1 root root 25 Jun 6 21:50 /root/go/src/github.com/openstack/swift -> ../../../../hail/swift-go
[root@rhev-a24c-01 go]# cd /root/go/src/github.com/openstack/swift/go
[root@rhev-a24c-01 go]# pwd
/root/go/src/github.com/openstack/swift/go
[root@rhev-a24c-01 go]# make get
go get -t ./...
[root@rhev-a24c-01 go]#
[link] post comment

Encrypt everything? Please reconsider. [29 May 2016|11:16am]

Somehow it became fashionable among site admins to set it up so that accessing over HTTP was immediately redirected to https://. But doing that adds new ways to fail, such as certificate expired:

Notice that Firefox provides no way to ignore the problem and access the website (which was supposed to be accessible over HTTP to begin with). Solution? Use Chrome, which does:

Or, disable NTP and change your PC's clock two days back (be careful with running make while doing so).

This was discussed by CKS previously (of course), and he seems to think that the benefits outweigh the downsides of an occasional fuck-up, like the only website in the world that has the information that I want right now is suddenly unavailable with no recourse.

UPDATE: Chris dicussed the problem some more and brought up other examples, such as outdated KVM appliances that use obsolete ciphers.

One thing I'm wondering about is if a redirect from http:// to https:// makes a lot of sense. If you do not support access by plain HTTP, why not return ECONNREFUSED? I'm sure it's an extremely naiive idea.

[link] 1 comment|post comment

Russian Joke [25 May 2016|01:15pm]

In a quik translation from Bash:

XXX: Still writing that profiler?
YYY: Naah, reading books now
XXX: Like what books?
YYY: TCP Illustrated, Understanding the Linux kernel, Linux kernel development
XXX: And I read "The Never-ending Path Of Hatred".
YYY: That's about Node.js, right?

[link] post comment

Dell, why u no VPNC [21 May 2016|09:46pm]

Yo, we heard you liked remote desktops, so we put remote desktop into a remote desktop, now you can remote desktop while remote desktop.

I remember how IBM simply put VPNC interface in their Bladecenter. It was so nice. Unfortunately, vendors never want to be too nice to users, so their next release switched to a Java applet. Dell copied their approach for DRAC5. In theory, this should be future-proof, all hail WORA. In practice, it only worked with a specific version of Java, which was current when Dell shipped R905 ten years ago. You know, back then Windows XP was new and hot.

Fortunately, by a magic of KVM, libvirt, and Qemu, it's possible to create a virtual machine, install Fedora 10 on it, and then run Firefox with the stupid Java applet on it. Also, the Firefox and Java have to run in 32-bit mode.

When I did it for the first time, I ran Firefox through X11 redirection. That was quite inconvenient: I had to stop the Firefox running on the host desktop, because one cannot run 2 firefoxes painting to the same $DISPLAY. The reason that happens is, well, Mozilla Foundation is evil, basically. The remote Firefox finds the running Firefox through X11 properties and then some crapmagic happens and everything crashes and burns. So, it's much easier just to hook to the VM with Vinagre and run Firefox with DISPLAY=:0 in there.

Those old Fedoras were so nice, BTW. Funnily enough, that VM with 1 CPU and 1.5 GB starts quicker than the host laptop with the benefit of SystemD and its ability to run tasks in parallel. Of course, the handing of WiFi in Fedora 20+ is light years ahead of nm-applet in Fedora 10. There was some less noticeable progress elsewhere as well. But in the same time, the bloat was phenomenal.

UPDATE: Java does not work. Running the JNLP simply fails after downloading the applets, without any error messages. To set the plugin type of "native", ssh to DRAC, then "racadm config -g cfgRacTuning -o cfgRacTunePluginType 0". No kidding.

[link] 2 comments|post comment

Dropbox lifts the kimono [06 May 2016|02:38pm]

Dropbox posted somewhat of a whitepaper about their exabyte storage system, which exceeds the largest Swift cluster by about 2 orders of magnitude. Here's a couple of fun quotes:

The Block Index is a giant sharded MySQL cluster, fronted by an RPC service layer, plus a lot of tooling for database operations and reliability. We’d originally planned on building a dedicated key-value store for this purpose but MySQL turned out to be more than capable.

Kinda like SQLite in Swift.

Cells are self-contained logical storage clusters that store around 50PB of raw data.

And they have dozens of those. Their cell has a master node BTW. Kinda like Ceph's PG, but unlike Swift.

RTWT

[link] post comment

OpenStack Swift Proxy-FS by SwiftStack [28 Apr 2016|10:42am]

SwiftStack's Joe Arnold and John Dickinson chose the Austin Summit and a low-key, #vBrownBag venue, to come out of closet with PROXY-FS (also spelled as ProxyFS), a tightly integrated addition to OpenStack Swift, which provides a POSIX-ish filesystem access to a Swift cluster.

Proxy-FS is basically a peer to a less known feature of Ceph Rados Gateway that permits accessing it over NFS. Both of them are fundamentally different from e.g. Swift-on-file in that the data is kept in Swift or Ceph, instead of a general filesystem.

The object layout is natural in that it takes advantage of SLO by creating a log-structured, manifested object. This way the in-place updates are handled, including appends. Yes, you can create a manifest with a billion of 1-byte objects just by invoking write(2). So, don't do that.

In response to my question, Joe promised to open the source, although we don't know when.

Another question dealt with the performance expectations. The small I/O performance of Proxy-FS is not going to be great in comparison to a traditional NFS filer. One of its key features is relative transparency: there is no cache involved and every application request goes straight to Swift. This helps to adhere to the principle of the least surprise, as well as achieve scalability for which Swift is famous. There is no need for any upcalls/cross-calls from the Swift Proxy into Proxy-FS that invalidate the cache, because there's no cache. But it has to be understood that Proxy-FS, as well as NFS mode in RGW, are not intended to compete with Netapp.

Not directly, anyway. But what they could do is to disrupt, in Christensen's sense. His disruption examples were defined as technologies that are markedly inferior to incumbents, as well as dramatically cheaper. Swift and Ceph are both: the filesystem performance sucks balls and the price per terabyte is 1/10th of NetApp (this statement was not evaluated by the Food and Drug Administration). If new applications come about that make use of these properties... You know the script.

[link] post comment

Amateur contributors to OpenStack [06 Apr 2016|10:44am]

John was venting about our complicated contribution process in OpenStack and threw this off-hand remark:

I'm sure the fact that nearly 100% of @openstack contributors are paid to be so is completely unrelated. #eyeroll

While I share his frustration, one thing he may be missing is that OpenStack is generally useless to anyone who does not have thousands of computers dedicated to it. This is a significant barrier of entry for hobbyists, baked straight into the nature of OpenStack.

Exceptions that we see are basically people building little pseudo-clusters out of a dozen of VMs. They do it with an aim of advancing their careers.

[link] 1 comment|post comment

SwiftStack versus Swift [30 Mar 2016|01:35pm]

Erik Pounds posted an article on SwiftStack official blog where it presented a somewhat corporatist view of Swift and Ceph that comes down to this:

They are both productized by commercial companies so all enterprises can utilize them... Ceph via RedHat and Swift via SwiftStack.

This view is extremely reductionist along a couple of avenues.

First, it tries to sweep all of Swift under SwiftStack umbrella, whereas in reality it derives a lot of strength from not being controlled by SwiftStack. But way to assuage the fears of monoentity control by employing the PTL, guys. Fortunately, in the real and more complex world, Red Hat pays me to work on Swift, as well as offer Swift as a product, and I do not find that my needs are in any way sabotaged by PTL. Certainly our product focus differs; Red Hat's lifetime management offering, OSPd, manages OpenStack first and Swift as much as it's a part of OpenStack, whereas SwiftStack offer a Swift-specific product. Still it's not like Swift equals SwiftStack. I think RackSpace continue to operate the largest Swift cluster in the world.

Second, Erik somehow neglects to notice that Ceph provides a Swift compatibility through a component known as Rados Gateway. It is an option, you know, although obviously it can never be a better Swift than Swift itself, or better Amazon S3 than S3 itself.

[link] post comment

The "fast-post" is merged into Swift [07 Mar 2016|12:51pm]

I'm just back from a hackathon at HPE site in Bristol, England, where we made a final look to so-called "fast-post" patch and merged it in. It was developed by Alistair Coles and basically made POST work as everyone expected it to work, at last.

In the original Swift, it was found that when you do a POST to an object, in the presence of failures it was possible to end with some nodes having old data but new (posted) attributes. The bad part was that the replication mechanism could not do anything to recoincile the inconsistency, and then your GET returns varying data forever depending on what node you hit. It occured when new timestamp from POST attached itself to old data (and other equivalent scenarios).

This is some of a fundamental issue with using a timestamp based replication in Swift. Greg and Chuck knew about it all along, and their solution was known as "POST to PUT". They made Swift Proxy to fetch the object, then update its attributes for the POST, then do essentially a PUT. This way timestamps, data, and attributes are always consistent, as they are in the initial PUT. If this POST-to-PUT thing occurs across a failure, replication uses timestamps to restore consistently correctly.

The problem with that, POST-to-PUT is slow, as well as deceptive. Users think they issue a lightweight POST, but actually they prompt a massive data move inside the cluster if the object is big.

Alasdair's insight was that the root of the problem was not that timestamps were no good as a basic mechanism, but that the "fast" POST broke them by assigning new timestamps to old data (or old attributes, metadata, Content-Type). As long as each indepentently settable thing had its own timestamp, there was no problem. In Swift, we have 3 of those: object data, object metadata, and Content-Type (don't ask). So, store 3 timestamps with each object and presto!

The actual patch employs an additional trick by not changing the container DB schema. Instead, it encodes 3 timestamps into 1 field where the timestamp used to live. This way a smooth migration is possible in a cluster where old async pendings still float, for example. It looks a little kludgy at first, but I convinced myself that it made sense under the circumstances.

P.S. The fast-post is not a default, even now. Needs Container Sync updated to be compatible. I think Eran was going to look into that.

[link] post comment

Ceph needs a build revolution [26 Feb 2016|04:05pm]

I've been poking at this Ceph thing since June, or 9 months now, and I feel like I'm overdue for a good rant. Today's topic is, Ceph's build system is absolutely insufferable. It's unbelievably fragile, and people break it every week at the most. Then it takes a week to fix, or even worse: it only breaks for me, but not for others, and then it stays broken, because I cannot possibly wade into the swamp this deep to fix it.

Here's today's post-10.0.3 trunk:

[zaitcev@lembas ceph-tip]$ sh autogen.sh 
..................
[zaitcev@lembas ceph-tip]$ ./configure --prefix=$(pwd)/build --with-radosgw
..................
[zaitcev@lembas ceph-tip]$ make -j3
..................
  CXX      common/PluginRegistry.lo
  CXXLD    libcommon_crc.la
ar: `u' modifier ignored since `D' is the default (see `U')
ar: common/.libs/libcommon_crc_la-crc32c_intel_fast_asm.o: No such file or direc
tory
Makefile:13137: recipe for target 'libcommon_crc.la' failed
[zaitcev@lembas ceph-tip]$ find . -name '*crc32c_intel*'
./src/common/.deps/libcommon_crc_la-crc32c_intel_fast_asm.Plo
./src/common/.deps/libcommon_crc_la-crc32c_intel_fast_zero_asm.Plo
./src/common/.deps/libcommon_crc_la-crc32c_intel_baseline.Plo
./src/common/.deps/libcommon_crc_la-crc32c_intel_fast.Plo
./src/common/.libs/libcommon_crc_la-crc32c_intel_baseline.o
./src/common/.libs/libcommon_crc_la-crc32c_intel_fast.o
./src/common/crc32c_intel_baseline.c
./src/common/crc32c_intel_baseline.h
./src/common/crc32c_intel_fast.c
./src/common/crc32c_intel_fast.h
./src/common/crc32c_intel_fast_asm.S
./src/common/crc32c_intel_fast_zero_asm.S
./src/common/libcommon_crc_la-crc32c_intel_fast.lo
./src/common/libcommon_crc_la-crc32c_intel_fast_asm.lo
./src/common/libcommon_crc_la-crc32c_intel_baseline.o
./src/common/libcommon_crc_la-crc32c_intel_fast.o
./src/common/libcommon_crc_la-crc32c_intel_fast_zero_asm.lo
./src/common/libcommon_crc_la-crc32c_intel_baseline.lo
[zaitcev@lembas ceph-tip]$ 

Sometimes these things fix themselves after a fresh clone/autogen.sh/configure/make. But doing so all the time is prohibited by how long Ceph builds. Literally it takes many hours (depending if you use autotools or Cmake, and how parallel your build is). I bought a 4-core laptop with 16 GB and SSD just for that. A $1,200 later, I only have to wait 4 hours. Yay, I can build Ceph 2 times in 1 day.

The situation is completely insane, and it remained so for the months I spent working on this. The worst is that I don't understand how people even deal with this without killing themselves. If you look at the pull requests, obviously a large number of developers manage to build this thing somehow... unless all of them post untested patches all the time.

UPDATE: Waiting a bit and a fresh clone made the build to complete, but then:

..................
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/q/zaitcev/ceph/ceph-tip/selinux'
[zaitcev@lembas ceph-tip]$ echo $?
0
[zaitcev@lembas ceph-tip]$ ./src/vstart.sh -n -d -r -i 192.168.132.2
ls: cannot access compressor/*/: No such file or directory
** going verbose **
./src/vstart.sh: line 374: ./init-ceph: No such file or directory
[zaitcev@lembas ceph-tip]$

We are about to freeze Jewel with this codebase.

[link] post comment

git rebase - proceed with caution [25 Feb 2016|01:09pm]

In the age of Github, we're not supposed to do the "git merge" anymore, but a "git rebase" instead. Everyone knows that. However, the rebase has its quirks. One stepped on goes like this:

  1. Have 2 patches: A and B, on top of a tree T. Submit A upstream.
  2. Upstream merges A.
  3. Do "git rebase" in T. You'd think A would disappear and you get to keep B on top of T (or T'=T+A).
  4. But instead, git finds some conflicts and throws you a 3-way merge node, which contains your A, upstream A, and some small, unrelated conflict. It has an empty commit message, obviously.
  5. Resolve the merge, "git commit -a", "git rebase --continue", and you end with next commit being empty. {Wrong move. See update below.}
  6. If at this point you think that it's the former A and do "git rebase --skip", then — hold onto your chair — B is thrown into a conflict as well, and its commit message is attached to a follow-on empty commit too, just like A was. If you skip that one, you lose the commit message forever. There's no reset you could do at that point.

Well, if you pushed on a branch, you could go back to Github and salvage the message from there.

Anyhow, rebase takes a certain care. You can't assume that it always works, or assume that you could return to the previous state of repository. In this sense it's fundamentally different from the merge, where you can always do "git reset", no matter how much you screwed up.

UPDATE 2016-08-22: I know where I went wrong. When you resolve a merge, you resolve it with a "git commit" (feel free to use -a, etc.). But when you prod rebase, don't do that. Instead, when you resolve a conflict, add resolved files manually with "git add", but do not try to commit. Just do "git rebase --continue".

[link] 4 comments|post comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]