?

Log in

No account? Create an account
Pete Zaitcev's Journal [entries|friends|calendar]
Pete Zaitcev

[ userinfo | livejournal userinfo ]
[ calendar | livejournal calendar ]

ARM servers apparently exist at last [16 Feb 2018|12:42am]

Check out what I found at Pogo Linux (h/t Bryan Lunduke):

ARM R150-T62
2 x Cavium® ThunderX™ 48 Core ARM processors
16 x DDR4 DIMM slots
3 x 40GbE QSFP+ LAN ports
4 x 10GbE SFP+ LAN ports
4 x 3.5” hot-swappable HDD/SSD bays
650W 80 PLUS Platinum redundant PSU
$5,638.82

The prices are ridiculouts, but at least it's a server with CentOS.

[link] 3 comments|post comment

More system administration in the age of SystemD [14 Feb 2018|05:23pm]

I'm tinkering with OpenStack TripleO in a simulated environment. It uses a dedicated non-privileged user, "stack", which can do things such as list VMs with "virsh list". So, yesterday I stopped the undercloud VM, and went to sleep. Today, I want to restart it... but virsh says:

error: failed to connect to the hypervisor
error: Cannot create user runtime directory '/run/user/1000/libvirt': Permission denied

What seems to happen is that when one logs into the stack@ user over ssh, systemd-logind mounts that /run/user/UID thing, but if I log as zaitcev@ and then do "su - stack", this fails to occur.

I have no idea what to do about this. It's probably trivial for someone more knowledgeable to throw the right pam_systemd line into /etc/pam.d/su. But su-l includes system-auth, which invokes pam_systemd.so, and yet... Oh well.

[link] 1 comment|post comment

Farewell Nexus 7, Hello Huawei M3 [03 Feb 2018|11:17pm]

Flying a photoshoot of the Carlson, I stuffed my Nexus 7 under my thighs and cracked the screen. In my defense, I did it several times before, because I hate leaving it on the cockpit floor. I had to fly uncoordinated for the photoshoot, which causes anything that's not fixed in place slide around, and I'm paranoid about a controls interference. Anyway, the cracked screen caused a significant dead zone where touch didn't register anymore, and that made the tablet useless. I had to replace it.

In the years since I had the Nexus (apparently since 2014), the industry stopped making good 7-inch tablets. Well, you can still buy $100 tablets in that size. But because the Garmin Pilot was getting spec-hungry recently, I had no choice but to step up. Sad, really. Naturally, I'm having trouble fitting the M3 into pockets where Nexus lived comfortably before. {It's a full-size iPad in the picture, not a Mini.}

The most annoying problem that I encountered was Chrome not liking the SSL certificate of www.zaitcev.us. It bails with ERR_SSL_SERVER_CERT_BAD_FORMAT. I have my own fake CA, so I install my CA certificate on clients and I sign my hosts. I accept the consequences and inconventice. The annoyance arises because Chrome does not tell what it does not like about the certificate. Firefox works fine with it, as do other applications (like IMAP clients). Chrome in the Nexus worked fine. A cursory web search suggests that Chrome may want alternative names keyed with "DNS.1" instead of "DNS". Dunno what it means and if it is true.

UPDATE: "Top FBI, CIA, and NSA officials all agree: Stay away from Huawei phones"

[link] 1 comment|post comment

400 gigabits, every second [23 Jan 2018|01:43pm]

I keep waiting for RJ-45 to fail to keep the pace with the gigabits, for many years. And it always catches up. But maybe not anymore. Here's what the connector looks for QSFP-DD, a standard module connector for 400GbE:

Two rows, baby, same as on USB3.

These speeds are mostly used between leaf and spine switches, but I'm sure we'll see them in the upstream routers, too.

[link] 4 comments|post comment

NUC versus laptop [20 Jan 2018|11:03am]

When I split off the router, I received a bit of a breather from the Fedora killing i686, because I do not have to upgrade the non-routing server as faithfully as an Internet-facing firewall. Still, eventually I must switch from the ASUS EEEPC to something viable.

So, I considered a NUC, just like the one that Richard W.M. Jones bought. It beats an old laptop in every way. In particular, it's increasingly difficult to disassemble laptops nowadays, and the candidate I have now has is hard drive buried in a particularly vexing way: the whole thing must be taken apart, with a dozen of tiny connectors carefully pried off, before the disk can be extracted. Still, a laptop offers a couple of features. #1: it always has a monitor and keyboard, an #2: it comes with its own uninterruptible power supply. And the cost is already amortized.

Long term, I am inclined to believe that Atwood is right and all user-facing computers will morph into tablets. When that happens, a supply of useful laptops will dry up and I will have to resort to whatever microserver box is available.But that today is not that day.

[link] post comment

New toy [15 Jan 2018|03:38pm]

Guess what.

A Russian pillowcase is much wider (or squar-er) than tubular American ones, so it works perfectly as a cover.

[link] post comment

Old news [12 Jan 2018|10:07am]

Per U.S. News:

Alphabet Inc's (GOOG, GOOGL) Google said in 2016 that it was designing a server based on International Business Machines Corp's (IBM) Power9 processor.

Have they put anything into production since then? If not, why bring this up?

UPDATE: R. Hubbell writes by e-mail:

So yes I think the move to the IBM is due to their encounter of the exploits.

A lot of lip service is given to the hazards of the monoculture. But why PPC of all things? Is Google becoming incapable of dealing with any supplier that is not a megacorp?

[link] post comment

A split home network [10 Jan 2018|10:52pm]

Real quick, why a 4-port router was needed.

  1. Red: Upstream link to ISP
  2. Grey: WiFi
  3. Blue: Entertainment stack
  4. Green: General Ethernet

The only reason to split the blue network is to prevent TiVo from attacking other boxes, such as desktops and printers. Yes, this is clearly not paranoid enough for a guy who insists on a dumb TV.

[link] post comment

Buying a dumb TV in 2018 America [10 Jan 2018|12:29pm]

I wanted to buy a TV a month ago and found that almost all of them are "Smart" nowadays. When I asked for a conventional TV, people ranging from a floor worker at Best Buy to Nikita Danilov at Facebook implied that I was an idiot. Still, I succeeded.

At first, I started looking at what is positioned as "conference room monitor". The NEC E506 is far away the leader, but it's expensive at $800 or so.

Then, I went to Fry's, who advertise quasi-brands like SILO. They had TVs on display, but were out. I was even desperate enough to be upsold to Athyme for $450, but they fortunately were out of that one too.

At that point, I headed to Best Buy, who have an exclusive agreement with Toshiba (h/t Matt Kern on Facebook). I was not happy to support this kind of distasteful arrangement, but very few options remained. There, it was either waiting for delivery, or driving 3 hours to a warehouse store. Considering how much my Jeep burns per mile, I declined.

Finally, I headed to a local Wal-Mart and bought a VISIO for $400 out the door. No fuss, no problem, easy peasy. Should've done that from the start.

P.S. Some people suggested buying a Smart TV and then not plugging it in. It includes not giving it the password for the house WiFi. Unfortunately, it is still problematic, as some of these TVs will associate with any open wireless network by default. An attacker drives by with a passwordless AP, and roots all TVs on the block. Unfortunately, I live an high-tech area where stuff like that happens all the time. When I mentioned it to Nikita, he thought that I was an idiot for sure. It's like a Russian joke about "dropping everything and moving to Uryupinsk."

[link] 3 comments|post comment

Caches are like the government [08 Jan 2018|04:51pm]

From an anonymous author, a follow-up to the discussion about the cache etc.:

counterpoint 1: Itanium, which was EPIC like Elbrus, failed even with Intel behind it. And it added prefetching before the end. Source: https://en.wikipedia.org/wiki/Itanium#Itanium_9500_(Poulson):_2012

counterpoint 2: To get fast, Elbrus has also added at least one kind of prefetch (APB, "Array Prefetch Buffer") and has the multimegabyte cache that Zaitcev decries. Source: [kozhin2016, 10.1109/EnT.2016.027]

counterpoint 3: "According to Keith Diefendorff, in 1978 almost 15 years ahead of Western superscalar processors, Elbrus implemented a two-issue out-of-order processor with register renaming and speculative execution" https://www.theregister.co.uk/1999/06/07/intel_uses_russia_military_technologies/

1. Itanium, as I recall, suffered from the poor initial implementation too much. Remember that 1st implementation was designed in Intel, while the 2nd implementation was designed at HP. Intel's chip stunk on ice. By the time HP came along, AMD64 became a thing, and then it was over.

Would Itanium win over the AMD64 if it were better established, burned less power, and were faster, sooner? There's no telling. The compatibility is an important consideration, and the binary translation was very shaky back then, unless you count Crusoe.

2. It's quite true that modern Elbrus runs with a large cache. That is because cache is obviously beneficial. All this is about is to consider once again if better software control of caches, and their better architecture in general, would disrupt side-channel signalling and bring performance advantages.

By the way, people might not remember it now, but a large chunk of Opteron's performance derived from its excellent memory controller. It's a component of CPU that tended not to get noticed, but it's essential. Fortunately, the Rowhammer vulnerability drew some much-needed attention to it, as well as a possible role for software control there.

3. Well, Prof. Babayan's own outlook at Elbrus-2 and its superscalar, out-of-order core was, "As you can see, I tried this first, and found that VLIW was better", which is why Elbrus-3 disposed with all that stuff. Naturally, all that stuff came back when we started to find the limits of EPIC (nee VLIW), just like the cache did.

[link] 1 comment|post comment

Police action in the drone-to-helicopter collision [05 Jan 2018|05:06pm]

The year 2017 was the first year when a civilian multicopter drone collided with a manned aircraft. It was expected for a while and there were several false starts. One thing is curious though - how did they find the operator of the drone? I presume it wasn't something simple like a post on Facebook with a video of the collision. They must've polled witnesses in the area, then looked at surveilance cameras or whatnot, to get it narrowed to vehicles.

UPDATE: Readers mkevac and veelmore inform that a serialized part of the drone was recovered, and the investigators worked through seller records to identify the buyer.

[link] 3 comments|post comment

Prof. Babayan's Revenge [05 Jan 2018|10:56am]

Someone at GNUsocial posted:

I suspect people trying to find alternate CPU architectures that don't suffer from #Spectre - like bugs have misunderstood how fundamental the problem is.Your CPU will not go fast without caches. Your CPU will not go fast without speculative execution. Solving the problem will require more silicon, not less. I don't think the market will accept the performance hit implied by simpler architectures. OS, compiler and VM (including the browser) workarounds are the way this will get mitigated.

CPUs will not go fast without caches and speculative execution, you say? Prof. Babayan may have something to say about that. Back when I worked under him in the 1990s, he considered caches a primitive workaround.

The work on Narch was informed by the observation that the submicron feature size provided designers with more silicon they knew what to do with. So, the task of a CPU designer was to identify ways to use massive amounts of gates productively. But instead, mediocre designers simply added more cache, even multi-level cache.

Talking about it was not enough, so he set out to design and implement his CPU, called "Narch" (later commercialized as "Elbrus-2000"). And he did. The performance was generally on par with its contemporaries, such as Pentium III and UltraSparc. It had a cache, but measured in kilobytes, not megabytes. But there were problems beyond the cache.

The second part of the Bee Yarn Knee's objection deals with the speculative execution. Knocking that out required a software known as a binary translator, which did basically the same thing, only in software[*]. Frankly at this point I cannot guarantee that it weren't possible to abuse that mechanism for unintentional signaling in the same ways Meltdown works. You don't have cache for timing signals in Narch, but you do have the translator, which can be timed if it runs at run time like in Transmeta Crusoe. In Narch's case it only ran ahead of time, so not exploitable, but the result turned out to be not fast enough for workloads that make a good use of speculative execution today (such as LISP and gcc).

Still, I think that a blanket objection that CPU cannot run fast with no cache and no speculative execution, IMHO, is informed by ignorance of alternatives. I cannot guarantee that E2k would solve the problem for good, after all its later models sit on top of a cache. But at least we have a hint.

[*] The translator grew from a language toolchain and could be used in creative ways to translate source. It would not be binary in such case. I omit a lot of detail here.

UPDATE: Oh, boy:

But the speedup from speculative execution IS from parallelism. We're just asking the CPU to find it instead of the compiler. So couldn't you move the smarts into the compiler?

Sean, this is literally what they said 30 years ago.

[link] 5 comments|post comment

More bugs [04 Jan 2018|02:27pm]

Speaking of stupid bugs that make no sense to report, Anaconda fails immediately in F27 if one of the disks has an exported volume group on it — in case you thought it was a clever way to protect some data from being overwritten accidentally by the installation. The workaround was to unplug the drives that contained the PVs in the problematic VG. Now, why not to report this? It's 100% reproducible. But reporting presumes a responsibility to re-test, and I'm not going to install a fresh Fedora in a long time again, hopefully, so I'm not in a position to discharge my bug reporter's responsibilities.

[link] 1 comment|post comment

The gdm spamming logs in F27, RHbz#1322588 [02 Jan 2018|05:36pm]

Speaking of the futility of reporting bugs, check out the 1322588. Basically, gdm tries to adjust the screen brightness when a user is already logged in on that screen (fortunately, it fails). Fedora users report the bug, the maintainer asks them to report it upstream. They report it upstream. The upstream tinkers with something tangentially related, closes the bug. Maintainer closes the bug in Fedora. The issue is not fixed, users re-open the bug and the process continues. It was going on for coming up to 2 years now. I don't know why the GNOME upstream cannot program gdm not to screw with the screen after the very same gdm has logged a user in. It's beyond stupid, and I don't know what can be done. I can buy a Mac, I suppose.

UPDATE:

-- Comment #71 from Cédric Bellegarde
Simple workaround:
- Disable auto brightness in your gnome session
- Logout and stop gdm
- Copy ~/.config/dconf/user to /var/lib/gdm/.config/dconf/user

UPDATE 2018-01-10:

-- Comment #75 from Bastien Nocera
Maybe you can do something there, instead of posting passive aggressive blog entries.

Back when Bastien maintained xine, we enjoyed a cordial working relationship, but I guess that does not count for anything.

[link] post comment

No more VLAN in the home network [02 Jan 2018|05:08pm]

Thanks to Fedora dropping the 32-bit x86 (i686) in F27, I had no choice but to upgrade the home router. I used this opportunity to get rid of VLANs and return to a conventional setup with 4 Eithernet ports. The main reason is, VLANs were not entirely stable in Fedora. Yes, they mostly worked, but I could never be sure that they would continue to work. Also, mostly in this context means, for example, that some time around F24 the boot-up process started hanging on the "Starting the LSB Networking" job for about a minute. It never was worth the trouble raising any bugs or tickets with upstreams, I never was able to resolve a single one of them. Not in Zebra, not in radvd, not in NetworkManager. Besides, if something is broken, I need a solution right now, not when developers turn it around. I suppose VLANs could be allright if I stuck to initscripts, but I needed NetworkManager to interact properly with the upstream ISP at some point. So, whatever. Fedora costed me $150 for the router and killed my VLAN setup.

I looked at ARM routers, but there was nothing. Or, nothing affordable that was SBSA and RHEL compatible. Sorry, ARM, you're still immature. Give me a call when you grow up.

Buying from Chinese was a mostly typical experience. They try to do good, but... Look at the questions about the console pinout at Amazon. The official answer is, "Hello,the pinouts is 232." Yes, really. When I tried to contact them by e-mail, they sent me a bunch of pictures that included pinouts for Ethernet RJ-45, pinout for motherboard header, and a photograph of a Cisco console cable. No, they don't use Cisco pinout. Instead, they use DB9 pin numbers on RJ-45 (obviously, pin 9 is not connected). It was easy to figure out using a multimeter, but I thought I'd ask properly first. The result was very stereotypical.

P.S. The bright green light is blink(1), a Christmas present from my daughter. I'm not yet using it to its full potential. The problem is, if it only shows a static light, it cannot indicate if the router hangs or fails to boot. It needs some kind of daemon job that constantly changes it.

P.P.S. The SG200 is probably going into the On-Q closet, where it may actually come useful.

P.P.P.S. There's a PoE injector under the white cable loop somewhere. It powers a standalone Cisco AP, a 1040 model.

[link] post comment

Marcan: Debugging an evil Go runtime bug [05 Dec 2017|02:20pm]

Fascinating, and a few reactions spring to mind.

First, I have to admit, the resolution simultaneously blew me away and was very nostalgic. Forgetting that some instructions are not atomic is just the thing that I saw people commit in architecture support in kernel (I don't remember if I ever used an opportunity to do it, it's quite possible, even on sun4c).

Also, my (former) colleague DaveJ (who's now consumed by Facebook -- I remember complaints about useful people "gone to Google and never heard from again", but Facebook is the same hole nowadays) once said, approximately: "Everyone loves to crap on Gentoo hackers for silly optimizations and being otherwise unprofessional, but when it's something interesting it's always (or often) them." Gentoo crew is underrated, including their userbase.

And finally:

Go also happens to have a (rather insane, in my opinion) policy of reinventing its own standard library, so it does not use any of the standard Linux glibc code to call vDSO, but rather rolls its own calls (and syscalls too).

Usually you hear about this when their DNS resolver blows up, but it can be elsewhere, as in this case.

(h/t to a chatter in #animeblogger)

UPDATE: CKS adds that some UNIXen officially require applications to use libc.

[link] 1 comment|post comment

ProxyFS opened, I think [06 Nov 2017|08:25pm]

Not exactly sure if that thing is complete, and I didn't attend the announcement (at OpenStack Summit in Sydney, presumably), but it appears that SwiftStack open-sourced ProxyFS. The project was announced to the world a year an a half ago.

UPDATE: The Swiftstack product is called "File Access", but AFAIK the project is still "ProxyFS".

[link] post comment

Polite like Sphinx [26 Oct 2017|03:26pm]

Exception occurred:
   File "/usr/lib/python2.7/site-packages/sphinx/util/logging.py", line 363, in filter
     raise SphinxWarning(message % record.args)
TypeError: not all arguments converted during string formatting
The full traceback has been saved in /tmp/sphinx-err-SD2Ra4.log, if you want to report the issue to the developers.

Love how modest this package is.

[link] post comment

Oh not again [24 Aug 2017|01:21pm]

Fedora is mulling dropping the 32-bit x86 again, after the F26, which means I need to buy a new router. It's not like I cannot afford one... But it's such as hassle to migrate. I'm thinking about installing one in the background and then re-numbering it, in order to minimize issues. Even then, I cannot test, for instance, that VLANs work right, until I actually phase the box into production. It's much easier to keep a compatible 32-bit box mirrored and ready on stand-by.

In a sense, the amazing ease of upgrades in modern Fedora lulled me into this. Before, I re-installed anyway, and so could roll 64-bit just as easily.

P.S. According to records at the hoster, my primary public VM was installed as Fedora 15 and continuously upgraded since then.

[link] 1 comment|post comment

Community Meeting [24 May 2017|03:52pm]
<notmyname> first, the idea of having a regular meeting in addition to this one for people in different timezones
<cschwede_> +2!
<notmyname> specifically, mahatic and pavel/onovy/seznam. but of course we've all seen various chinese contributors too
<notmyname> but the point is that it's a place to bring up stuff that those in the other time zones are working on
<mattoliverau> Cool
<notmyname> I think it's a terrific idea
<tdasilva> i bet the guys working on tape would like that too
<notmyname> my goal is to find a time for it that is so horrible for US timezones that it will be obvious that not everyone needs to be there
<zaitcev> Yeah, if only there was a way to send a message... like a mail... to a list of people. And then it could be stored on a computer somewhere, ready to be read in any timezone recepient is in.
<notmyname> zaitcev: crazytown!
<mattoliverau> zaitcev: now your just talkin crazy
[link] post comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]