Pete Zaitcev (zaitcev) wrote,
Pete Zaitcev

Prof. Babayan's Revenge

Someone at GNUsocial posted:

I suspect people trying to find alternate CPU architectures that don't suffer from #Spectre - like bugs have misunderstood how fundamental the problem is.Your CPU will not go fast without caches. Your CPU will not go fast without speculative execution. Solving the problem will require more silicon, not less. I don't think the market will accept the performance hit implied by simpler architectures. OS, compiler and VM (including the browser) workarounds are the way this will get mitigated.

CPUs will not go fast without caches and speculative execution, you say? Prof. Babayan may have something to say about that. Back when I worked under him in the 1990s, he considered caches a primitive workaround.

The work on Narch was informed by the observation that the submicron feature size provided designers with more silicon they knew what to do with. So, the task of a CPU designer was to identify ways to use massive amounts of gates productively. But instead, mediocre designers simply added more cache, even multi-level cache.

Talking about it was not enough, so he set out to design and implement his CPU, called "Narch" (later commercialized as "Elbrus-2000"). And he did. The performance was generally on par with its contemporaries, such as Pentium III and UltraSparc. It had a cache, but measured in kilobytes, not megabytes. But there were problems beyond the cache.

The second part of the Bee Yarn Knee's objection deals with the speculative execution. Knocking that out required a software known as a binary translator, which did basically the same thing, only in software[*]. Frankly at this point I cannot guarantee that it weren't possible to abuse that mechanism for unintentional signaling in the same ways Meltdown works. You don't have cache for timing signals in Narch, but you do have the translator, which can be timed if it runs at run time like in Transmeta Crusoe. In Narch's case it only ran ahead of time, so not exploitable, but the result turned out to be not fast enough for workloads that make a good use of speculative execution today (such as LISP and gcc).

Still, I think that a blanket objection that CPU cannot run fast with no cache and no speculative execution, IMHO, is informed by ignorance of alternatives. I cannot guarantee that E2k would solve the problem for good, after all its later models sit on top of a cache. But at least we have a hint.

[*] The translator grew from a language toolchain and could be used in creative ways to translate source. It would not be binary in such case. I omit a lot of detail here.

UPDATE: Oh, boy:

But the speedup from speculative execution IS from parallelism. We're just asking the CPU to find it instead of the compiler. So couldn't you move the smarts into the compiler?

Sean, this is literally what they said 30 years ago.

Tags: #spectre
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

Вы работали в ИНЭУМ? :)
Надо же, а я сейчас там бываю -- портируем альт на эльбрус.

Спасибо за статью, у меня сходу ровно одно дополнение -- rtc (бинарный транслятор) можно пропатчить как software, а не microcode в лучшем разе. Ну и это вообще для случая, когда надо гонять x86-код.

Кстати, нынешние lcc весьма радуют -- переход с 1.20 на 1.21 по производительности кода на Эльбрус-401 выглядел на глаз как апгрейд со старшего PIII на средний C2D (и при этом собирать он тоже стал быстрей -- по крайней мере под собранным им же ядром); сейчас начинаем щупать 1.23.


January 6 2018, 17:11:53 UTC 9 months ago Edited:  January 6 2018, 19:26:45 UTC

Это было задолго до объединения, так что работал я ИТМиВТ, причём формально в 1-м отделении, в группе Рачинского. Так получилось. А потом в МЦСТ уже формально под А.К.Кимом.
Понял, спасибо; а нынче в наших краях бываете?
Спасибо. Но процитирую из одного закрытого обсуждения этого поста - интересно, насколько вы с этими доводами (не)согласны:

counterpoint 1: Itanium, which was EPIC like Elbrus, failed even with Intel behind it. And it added prefetching before the end. Source:

counterpoint 2: To get fast, Elbrus has also added at least one kind of prefetch (APB, "Array Prefetch Buffer") and has the multimegabyte cache that Zaitcev decries. Source: [kozhin2016, 10.1109/EnT.2016.027]

counterpoint 3: "According to Keith Diefendorff, in 1978 almost 15 years ahead of Western superscalar processors, Elbrus implemented a two-issue out-of-order processor with register renaming and *speculative execution*" [the register, so apply extra iodized or kosher grains as needed; emphasis mine]


January 8 2018, 22:52:41 UTC 9 months ago Edited:  January 8 2018, 22:54:01 UTC

I decided to throw this into its own post -
(Несогласным тут быть не с чем, автор перечисляет набор неоспоримых фактов)