Top.Mail.Ru
Pete Zaitcev's Journal
? ?
Pete Zaitcev's Journal [entries|friends|calendar]
Pete Zaitcev

[ userinfo | livejournal userinfo ]
[ calendar | livejournal calendar ]

virtio_pci: do not wait forvever at a reset [30 Oct 2024|12:58pm]

We all know how it's possible for a guest VM to access various host functions by accessing a PCI device, right? When KVM traps an access to this fake PCI, QEMU emulates the device, which allows packets sent, console updated, or whatever. This is called "virtio".

NVIDIA took it a step further: they have a real PCI device that emuilates QEMU. No joke. And, they have a firmware bug! The following patch works around it:

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 9193c30d640a..6bbb34f9b088 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -438,6 +438,7 @@ static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 	struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+	int i;
 
 	/* 0 status means a reset. */
 	vp_modern_set_status(mdev, 0);
@@ -446,8 +447,16 @@ static void vp_reset(struct virtio_device *vdev)
 	 * This will flush out the status write, and flush in device writes,
 	 * including MSI-X interrupts, if any.
 	 */
-	while (vp_modern_get_status(mdev))
+	i = 0;
+	while (vp_modern_get_status(mdev)) {
+		if (++i >= 10000) {
+			printk(KERN_INFO
+			       "virtio reset ignoring status 0x%02x\n",
+			       vp_modern_get_status(mdev));
+			break;
+		}
 		msleep(1);
+	}
 
 	vp_modern_avq_cleanup(vdev);
 

I'm not dumping on NVIDIA here at all, I think it's awesome for this devious hardware to exist. And bugs are just a way of life.

[link] post comment

LinkedIn Asked You To Train Their AI [30 Oct 2024|12:17pm]
They pushed the "You're one of a few experts invited to answer" notifications for a long time - maybe a year, I don't remember. When I had enough and started to capture them with the intent of mockery, they stopped. So sad. Here's what I got:

"You're facing pushback from vendors on cloud integration. How can you convince them to collaborate?"

"You're focused on cutting costs in cloud computing. How do you ensure security protocols aren't compromised?"

"You're overseeing a code review process. How do you ensure feedback boosts developer morale?"

What a dystopia. LinkedIn is owned by Microsoft, so I'm not suprised someone in a giant corporation thought this sort of nonsense was a good idea. But still, the future is stupid, and all that.

P.S. The notification inserts were non-persistent — inserted on the fly. That was just fraud w.r.t. the idea of notification ticker.

P.P.S. Does anyone else think that this sort of thing would cause self-selection? They made their AI trained by the most vain and also least bright members of their user population. I'm not an expert in any of these fields.

UPDATE 2024-10-31: Spoke too soon! They hit me with the notificantion insert: "Here's how you can craft a personalized learning plan for advancing in Cloud Computing." That is not even a formed question. Getting lazy, are we?

UPDATE 2024-11-02: "You're facing budget disputes over cloud solutions. How can you align IT and non-technical teams effectively?" They are not stopping.

Meanwhile, how about another perspective: I saw an update that Hubbert Smith contributed an answer to: "You're facing a ransomware attack crisis. How do you convey the severity to a non-technical executive?" Instead of answering what LinkedIn AI asked, he answered a question of how to deal with ransomware ("Ransomware is fixable with snapshots of sensitive data."). Unless he is an AI himself, he may be thinking that he's dealing with a LinkedIn equivalent of Quora.

I'm trying to ask him what happened.
[link] post comment

Adventures in proprietary software, Solidworks edition [06 Oct 2024|11:39am]

Because FreeCAD was such a disaster for me, I started looking at crazy solutions, like exporting STEP from OpenSCAD. I even stooped to looking at proprietary alternatives. First on the runway was SolidWorks. If it's good for Mark Serbu, surely it's good for me, right?

The first thing I found, you cannot tap your card and download. You have to contact a partner representative — never a good sign. The representative quoted me for untold thousands. I'm not going to post the amount, I'm sure they vary it every time, like small shop owners who vary prices according to the race of the shopper.

In addition, they spam like you would not believe. First you have to unsubscribe from the partner, next from community.3ds.com, next from draftsight.3ds.com, and so on. Eventually, you'll get absolutely random spam, you try to unsubscribe, and they just continue and spam. Fortunately, I used a one-time address, and I killed it. Phew.

[link] 2 comments|post comment

Fedora Panic Canceled [30 Jul 2024|02:06pm]

The other day I was watching a video by Rich Jones about Fedora on RISC-V. In it, he mentions off-hand that CentOS Stream 10 inherits from Fedora 40.

I don't know how that happened anymore, but previously someone made me think that 1. there will be no more numbered releases of CentOS, which is why it is called "CentOS Stream" now, and 2. CentOS Stream is now the upstream of RHEL, replacing Fedora. I really was concerned for the future of Fedora, that was superfluous in that arrangement, you know!

But apparently, Fedora is still the trunk upstream, and CenOS Stream is only named like that. Nothing changes except CentOS is no longer a clone of RHEL, but instead RHEL is a clone of CentOS. What was all the panic for?

I made a VM at Kamatera a few months ago, and they didn't even have Fedora among images. I ended using Rocky.

[link] 1 comment|post comment

Export to STEP in OpenSCAD [29 Apr 2024|12:37am]
The only way to obtain STEP from OpenSCAD that I know is an external tool that someone made. It's pretty crazy actually: it parses OpenSCAD's native export, CSG, and issues commands to OpenCASCADE's CLI, OCC-CSG. The biggest issue for me here is that his approach cannot handle transformations that the CLI does not support. I use hull all over the place and a tool that does not support hull is of no use for me.

So I came up with a mad lad idea: just add a native export of STEP to OpenSCAD. The language itself is constructive, and an export to CSG exists. I just need to duplicate whatever it does, and then at each node, transform it into something that can be expressed in STEP.

As it turned out, STEP does not have any operations. It only has manifolds assembled from faces, which are assembled from planes and lines, which are assembled from cartesian points and vectors. Thus, I need to walk the CSG, compile it into a STEP representation, and only then write it out. Operations like union, difference, or hull have to be computed by my code. The plan is to borrow from OpenSCAD's compiler that builds the mesh, only build with larger pieces - possibly square or round.

Not sure if this is sane and can be made to work, but it's pretty fun at least.
[link] post comment

sup Python you okay bro [17 Apr 2024|09:44pm]
What do you think this does:

class A(object):
 def aa(self):
 return 'A1'
class A(object):
 def aa(self):
 return 'A2'
a = A()
print("%s" % a.aa())

It prints "A2".

But before you think "what's the big deal, the __dict__ of A is getting updated", how about this:

class A(object):
 def aa(self):
 return 'A1'
class A(object):
 def bb(self):
 return 'A2'
a = A()
print("%s" % a.aa())

This fails with "AttributeError: 'A' object has no attribute 'aa'".

Apparently, the latter definition replaces the former completely. This is darkly amusing.

Python 3.12.2
[link] 1 comment|post comment

Trailing whitespace in vim [16 Apr 2024|03:26pm]
Problem:
When copying from tmux in gnome-terminal, the text is full of whitespace. How do I delete it in gvim?

Solution:
/ \+$

Obviously.

This is an area where tmux is a big regression from screen. Too bad.
[link] 3 comments|post comment

Boot management magic in Fedora 39 [15 Apr 2024|12:51pm]
Problem: After an update to F39, a system continues to boot F38 kernels

The /bin/kernel-install generates entries in /boot/efi/loader/entries instead of /boot/loader/entries. Also, they are in BLS Type 1 format, and not in the legacy GRUB format. So I cannot copy them over.

Solution:
[root@chihiro zaitcev]# dnf install ostree
[root@chihiro zaitcev]# rm -rf /boot/efi/$(cat /etc/machine-id) /boot/efi/loader/

I've read a bunch of docs and the man page for kernel-install(8), but they are incomprehensible. Still the key insight was that all that Systemd stuff loves to autodetect by finding this directory or that.

The way to test is:
[root@chihiro zaitcev]# /bin/kernel-install -v add 6.8.5-201.fc39.x86_64 /lib/modules/6.8.5-201.fc39.x86_64/vmlinuz
[link] 1 comment|post comment

Running OpenDKIM on Fedora 39 [01 Mar 2024|03:57pm]
postfix-3.8.1-5.fc39.x86_64
opendkim-2.11.0-0.35.fc39.x86_64

Following generic guides (e.g. at Akamai Linode) almost got it all working with ease. There were a few minor problems with permissions.

Problem:
Feb 28 11:45:17 takane postfix/smtpd[1137214]: warning: connect to Milter service local:/run/opendkim/opendkim.sock: Permission denied
Solution:
add postfix to opendkim group; no change to Umask etc.

Problem:
Feb 28 13:36:39 takane opendkim[1136756]: 5F1F4DB81: no signing table match for 'zaitcev@kotori.zaitcev.us'
Solution:
change SigningTable from refile: to a normal file

Problem:
Feb 28 13:52:05 takane opendkim[1138782]: can't load key from /etc/opendkim/keys/dkim001.private: Permission denied
Feb 28 13:52:05 takane opendkim[1138782]: 93FE7D0E3: error loading key 'dkim001._domainkey.kotori.zaitcev.us'
Solution:
[root@takane postfix]# chmod 400 /etc/opendkim/keys/dkim001.private
[root@takane postfix]# chown opendkim /etc/opendkim/keys/dkim001.private
[link] post comment

Strongly consistent S3 [19 Feb 2024|11:57am]

Speaking of S3 becoming strongly consistent, Swift was strongly consistent for objects in practice from the very beginning. All the "in practice" assumes a reasonably healthy cluster. It's very easy, really. When your client puts an object into a cluster and receives a 201, it means that a quorum of back-end nodes stored that object. Therefore, for you to get a stale version, you need to find a proxy that is willing to fetch you an old copy. That can only happen if the proxy has no access to any of the back-end nodes with the new object.

Somewhat unfortunately for the lovers of consistency, we made a decision that makes the observation of eventual consistency easier some 10 years ago. We allowed a half of an even replication factor to satisfy quorum. So, if you have a distributed cluster with factor of 4, a client can write an object into 2 nearest nodes, and receive a success. That opens a window for another client to read an old object from the other nodes.

Oh, well. Original Swift defaulted for odd replication factors, such as 3 and 5. They provided a bit of resistance to intercontinental scenarios, at the cost of the client knowing immediately if a partition is occurring. But a number of operators insisted that they preferred the observable eventual consistency. Remember that the replication factor is configurable, so there's no harm, right?

Alas, being flexible that way helps private clusters only. Because Swift generally hides the artitecture of the cluster from clients, they cannot know if they can rely on consistency of a random public cluster.

Either way, S3 does something that Swift cannot do here: the consistency of bucket listings. These things start lagging in Swift at a drop of a hat. If the container servers fail to reply in milliseconds, storage nodes push the job onto updaters and proceed. The delay in listigs is often observable in Swift, which is one reason why people doing POSIX overlays often have their own manifests.

I'm quite impressed by S3 doing this, given their scale especially. Curious, too. I wish I could hack on their system a little bit. But it's all proprietary, alas.

[link] post comment

git cp orig copy [09 Dec 2023|08:21pm]

Problem: I want to copy a file in git and preserve its history.

Solution: Holy guacamole!:

git checkout -b dup
git mv orig copy
git commit --author="Greg " -m "cp orig copy"
git checkout HEAD~ orig
git commit --author="Greg " -m "restore orig"
git checkout -
git merge --no-ff dup

[link] post comment

Pleroma and restricted timelines [08 Dec 2023|02:32pm]

Problem: you run a Pleroma instance, and you want to restrict the composite timelines to logged-in users only, but without going to the full "public: false". This is a common ask, and the option restrict_unauthenticated is documented, but the syntax is far from obvious!

Solution:

config :pleroma, :restrict_unauthenticated,
 timelines: %{local: true, federated: true}
[link] post comment

RHEL 9 on libvirt and KVM [06 Dec 2023|11:54pm]

Problem: you create and VM like you always did, but RHEL 9 bombs with:

Fatal glibc error: CPU does not support x86-64-v2

Solution: as Dan Berrange explains in bug #2060839, a traditional default CPU model qemu64 is no longer sufficient. Unfortunately, there's no "qemu64-v2". Instead, you must select one of the real CPUs.

<cpu mode='host-model' match='exact' check='none'>
<model fallback='forbid'>Broadwell-v4</model>
</cpu>
[link] post comment

Suvorov vs Zubrin [25 Nov 2023|09:27pm]

Bro, u mad?

Zubrin lies like he breathes, not bothering even calculate in Excel, not to mention take integrals!

But they continue to believe him — because it's Zubrin!

When they discussed his "engine on salt of Uranium" [NSWR — zaitcev] I wrote that the engine will not work in principle, because a reactor can only work in case of effective deceleration of neutrons, but already at the temperature of the moderator of 3000 degrees (like in KIWI), the cross-section of fission decreases 10 times, and the critical mass increases proportionally. But nobody paid attention — who am I, who is Zubrin!

The core has to be hot, and the moderator has to be cool, this is essential.

But they continued to fantasize, is it going to be 100,000 degrees in there, or only 10,000?

No matter how much I pointed out the principal contradiction — here is the cold sub-critical solution, and here is super-critical plasma, only in a meter or two away, and therefore neutrons from this plasma fly into the solution — which will inevitably capture them, decelerate, and react, and therefore the whole concept goes down the toilet.

But they disucss this salt engine over decades, without trying to check Zubrin's claims.

All "normal" nuclear reactors work only in the region between "first" (including the delayed neutrons) and "second" (with fast neutrons) criticalities. Only in this region, control of the reactor is possible. By the way, the difference in breeding ratios is only 1.000 and 1.007 for slow neutrons and 1.002 for the fast ones (in case of Plutonium, this much is the case even for slow neutrons).

And by the way, average delay for delayed neutrons is 0.1 seconds! The solution has to remain in the active zone for 100 milliseconds, in order to capture the delayed neutrons! Not even the solid phase RD-0410 reached that much.

Therefore, Zubrin's engine must be critical at the prompt neutrons. And because the moderator underperforms because it's hot, prompt neutrons become indistinguishable from fission neutrons, and therefore the density of plasma has to be the same as density of metal in order to achieve criticality — that is to say, almost 20 g/sm^3 for Uranium.

But this persuades nobody, because Zubrin is Zubrin, and who are you?

It all began as a discussion of Mars Direct among geeks, but escalated quickly.

[link] 1 comment|post comment

Python subprocess and stderr [28 Oct 2023|09:02pm]

Suppose you want to create a pipeline with the subprocess and you want to capture the stderr. A colleague of mine upstream wrote this:


    p1 = subprocess.Popen(cmd1,
      stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
    p2 = subprocess.Popen(cmd2, stdin=p1.stdout,
      stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
    p1.stdout.close()
    p1_stderr = p1.communicate()
    p2_stderr = p2.communicate()
    return p1.returncode or p2.returncode, p1_stderr, p2_stderr

Unfortunately, the above saves the p1.stdout in memory, which may come back to bite the user once the amount piped becomes large enough.

I think the right answer is this:


    with tempfile.TemporaryFile() as errfile:
        p1 = subprocess.Popen(cmd1,
          stdout=subprocess.PIPE, stderr=errfile, close_fds=True)
        p2 = subprocess.Popen(cmd2, stdin=p1.stdout,
          stdout=subprocess.PIPE, stderr=errfile, close_fds=True)
        p1.stdout.close()
        p2.communicate()
        p1.wait()
        errfile.seek(0)
        px_stderr = errfile.read()
    return p1.returncode or p2.returncode, px_stderr

Stackoverflow is overflowing with noise on this topic. Just ignore it.

[link] post comment

MinIO in the news [04 Apr 2023|07:15pm]

The company we mentioned previously is in the news. Seen at Blocks And Files:

If MinIO want to discourage software developers from including its code in their products it is managing this very well.

[link] post comment

Cura on Fedora is dead, use Slic3r [02 Feb 2022|06:55pm]
Was enjoying my Prusa i3S for a few months, but had to use my Lulzbot Mini today, and it was something else.

In the past, I used the cura-lulzbot package. It went through difficult times, with a Russian take-over and Qtfication. But I persisted in suffering, because, well, it was turnkey and I was a complete novice.

So, I went to install Cura on Fedora 35, and found that package cura-lulzbot is gone. Probably failed to build, and with spot no longer at Red Hat, nobody was motivated enough to keep it going.

The "cura" package is the Ultimaker Cura. It's an absolute dumpster fire of pretend open source. Tons of plug-ins, but basic materials are missing. I print in BASF Ultrafuse ABS, but the nearest available material is the PC/ABS mix.

The material problem is fixable with configuration, but a more serious problem is that UI is absolutely bonkers with crazy flashing - and it does not work. They have menus that cannot be reached: as I move cursor into a submenu, it disappears. Something seriously broken in Qt on F35.

BTW, OpenSCAD suffers from incorrect refresh too on F35. It's super annoying, but at least it works, mostly.

Fortunately, "dnf remove cura" also removes 743 trash packages that it pulls in.

Then, I installed Slic3r, and that turned out to be pretty dope. It's a well put together package, and it has a graphical UI nowadays, operation of which is mostly bug-free and makes sense.

However, my first print popped off. As it turned out, Lulzbot requires the initial sequence that auto-levels it, and I missed that. I could extract it from my old dot files, but in the end I downloaded a settings package from Lulzbot website.
[link] post comment

PyPI is not trustworthy [04 Jan 2022|02:34pm]

I was dealing with a codebase S at work that uses a certain Python package N (I'll name it in the end, because its identity is so odious that it will distract from the topic at hand). Anyhow, S failed tests because N didn't work on my Fedora 35. That happened because S installed N with pip(1), which pulls from PyPI, and the archive at PyPI contained broken code.

The code for N in its source repository was fine, only PyPI was bad.

When I tried to find out what happened, it turned out that there is no audit trail for the code in PyPI. In addition, it is not possible to contact listed maintainers of N in PyPI, and there is no way to report the problem: the problem tracking system of PyPI is all plastered with warnings not to use it for problems with packages, but only with PyPI software itself.

By fuzzy-matching provided personal details with git logs, I was able to contact the maintainers. To my great surprise, two out of three even responded, but they disclaimed any knowledge of what went on.

So, an unknown entity was able to insert a certain code into a package at PyPI, and pip(1) was downloading it for years. This only came to light because the inserted code failed on my Fedora test box.

At this point I can only conclude that PyPI is not trustworthy.

Oh, yeah. The package N is actually nose. I am aware that it was dead and unmaintained, and nobody should be using it anymore, least of all S. I'm working on it.

[link] post comment

Adventures in tech support [23 Dec 2021|02:17pm]

OVH was pestering me about migrating my VPS from its previous range to the new (and more expensive) one. I finally agreed to that. Migrated the VM to the new host, it launches with no networking. Not entirely unexpected, but it gets better.

The root cause is the DHCP server at OVH returning a lease with netmask /32. In that situation, it's not possible to add a default route, because the next hop is outside of the netmask.

Seems like a simple enough problem, so I filed a ticket in OVH support, basically saying "your DHCP server supplies incorrect netmask, please fix it." Their ultimate answer was, and I quote:

Please note that the VPS IP and our failover IPs are a static IPs, configuring the DHCP server may cause a network issues.

I suppose I knew what I was buying when I saw the price for these OVH VMs. It was too cheap to be true. Still, disappointing.

[link] 3 comments|post comment

Scalability of a varying degree [14 Sep 2021|03:27pm]

Seen at official site of Qumulo:

Scale

Platforms must be able to serve petabytes of data, billions of files, millions of operations, and thousands of users.

Thousands of users...? Isn't it a little too low? Typical Swift clusters in Telcos have tens of millions of users, of which tens or hundreds of thousands are active simultaneously.

Google's Chumby paper has a little section on scalability problem with talking to a cluster over TCP/IP. Basically at low tens of thousands you're starting to have serious issues with kernel sockets and TIME_WAIT. So maybe that.

[link] post comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]