Wednesday, July 9th, 2008

Follow focus nazi

I have a laptop with Synaptics touchpad, and one of the biggest annoyances in its implementation is how the "wheel" strips act upon the GUI element where the mouse is pointing regardless of the current focus. It took years to beat focus-follow-mouse people into a marginal submission, yet they still find every crack to seep through, like kerosene. Naturally, this behaviour is not tunable.

Another retardity in this area is the behaviour of the panel. Some genious came up with an awesome idea to flip windows if panel receives a wheel click. The problem with that is, a strip on the laptop is difficult to make to deliver just one click, so the function is useless to me. Also, it is too easy to hit the strip accidentially (which would not be a big deal if not for the enforced focus-follow). So, for one reason or the other, I would like to disable this behaviour. I don't even want to seek excuses for this wish. Just do it. But it's impossible.

So far the only group that displayed a clue in this was, strangely enough, Mozilla. They fixed bugs I filed and in general had an understanding of various input and display technologies, despite only being browser people. You'd think desktop developers would know how a touchpad differs from a wheel mouse.

UPDATE: Chris pointed out in comments that a fundamental reason exists why wheel behaves this way: the extra button events are mouse buttons, and those go where pointer points. I knew it, but it didn't click. I still think that applications should be able to work around this, although perhaps at the cost of some extra events being delivered.

(9 comments | Leave a comment)

Saturday, June 21st, 2008

Unforgivable

Here's what new pulseaudio does after update in Rawhide today:

VLC media player 0.8.6h Janus
E: core-util.c: Failed to state home directory /q/zaitcev/.pulse: No such file or directory
E: core-util.c: Assertion 'fn' failed at pulsecore/core-util.c:1086, function pa_lock_lockfile(). Aborting.
Aborted

Talkative library functions are bad enough, but aborting and taking down whole application?! How amateurish.

(3 comments | Leave a comment)

Friday, June 20th, 2008

Jim's strategy

Jeremy posted a shorthand of a meeting with Jim, our CEO. I think it's pretty interesting, although it's not very new for me, because it's consistent with his internal message and I've met with him before.

Jim talks about the need to involve companies (and their member individuals) into the Open Source in general. I quite agree, although in my like of work I see it in a very narrow way. I interact with all kinds of customers. Some are used to the old, "black box" way. If a test round is needed, I send them a kernel, they run it, collect the results, I think about it, change something, send it again... etcetera. Other (for example, Stratus, Fujitsu) chose "open box" approach: they look at my patches and produce feedback on patches.

Even though I never play favourites with customer problems, "open box" people tend to come to solutions much quicker.

I used to think that there must be some downside to "open box", because they have to have some expertise in-house to deal with source, and expertise costs money. But it is more and more apparent that basic reading of the source is not black magic. Customers always have engineers who can read it. Sure, they may not have intimate understanding of it, but that's what they pay Red Hat for. The basic advantage is essentially free for them.

Another thing Jim talks about is taking a high road relatively Ubuntu. From ethical standpoint, he is right, of course. But I keep thinking... Ubuntu is popular. Not as popular as Windows, I guess, but it is a success, and you never argue with success. Fat lot of good will it do to Free Software if everyone moves to Ubuntu. Fortunately, Fedora is a success too, for now at least. But it looks like Jim just believes that truth will always prevail... I am not so sure. It is not how the world works.

(1 comment | Leave a comment)

Sunday, June 1st, 2008

Attack of Sqlite

Recently it became fashionable to link Sqlite everywhere (for example, yum uses it). The awsum workout that Firefox gives to your kjournald is also Sqlite. But now I see a little problem. What if anything happens to the database?

XML was bad enough, but it is repairable, if with a bigger difficulty than plain text. When GNOME eliminated battstat-applet, and started to throw an funny dialog, I was able to fix it by removing a few files and directories in my ~/.gconf (Thank you, Federico).

Now my Liferea develped a problem. One of the feeds has a phantom item in it: it shows one unread even if there aren't any unread items. If I click "Mark All Read", Liferea crashes. How am I supposed to repair this? I suppose there may be some command line tools coming with Sqlite which allow to issue SQL statements, but without knowing how the database is laid out, I cannot formulate such statements.

I expect this kind of thing to become more common as more people jump on the bandwagon.

(5 comments | Leave a comment)

Wednesday, May 21st, 2008

RSPoD on Rawhide

The Penny-Arcade game a.k.a. RSPoD is out today and it bombs on me with: "Could not find a compatible display device // Make sure your display device supports OpenGL 1.2 and the following extensions: ARB_multitexture ... ARB_texture_compression EXT_texture_compression_s3tc". On my system, OpenGL is provided by MESA 7.1-0.29.fc9, and according to glxinfo, some required extensions are missing. That wouldn't be so bad, but I have no idea what how this may be corrected. Hardware OpenGL in the video card, I guess. I have "Radeon Xpress 1100 IGP", which is a low-end mobile card.

UPDATE: Received the following e-mail:

Hi I had the same problem on Vista, because on my laptop the latest installed drivers were the ones I got from windows update. I then installed the ones that came from the manufacturers site and voilà..

. . .

UPDATE: See Anholt's entry.

(9 comments | Leave a comment)

Wednesday, May 14th, 2008

Rio and Upstart

Seen at Rio's place:

Fedora9のupstart、すごいんですけど...。さすがに組み込みみたいな速さでは無いけれど、これならサスペンドしなくても良いんじゃ...。

Which, in my very approximate translation means:

Upstart of Fedora 9 is great, mostly. As expected it includes no visible speed, so not using suspend is not good.

So, I guess that Rio expected improvements which would allow to stop suspending and they did not materialize... Which makes sense, but why the superlatives then? The title of the post was "upstartすげい!" with the exclamation mark. I would understand if he wrote that Upstart allowed him to end suspends, but no, "速さでない" is simple enough even for me to understand. Oh well, perils of international blogging.

Once I figured out that the control file syntax is documented in events(5) of all places, Upstart became rather tolerable, even welcome. I think that our famously poor bootstrap times (which are not that bad in Fedora when compared to other distros — I've seen real hard benchmarks — but are just bad for me as a user) have more to do with trying to execute too much crap. Upstart allows us to do it more efficiently, but it's a palliative.

UPDATE: piyokun comments that the right translation is more like "Of course it's not as fast as embedded (linux), but with (upstart) you can get by without suspending." So, the "shinakute" is like "doing", "mo" is change of state (he suspended before, but not anymore), "n" is explanation tag, and "ja" is uncertainty. Casual, of course. Oh, and "kumikomu" is a verb meaning "to incorporate". I had no idea that they had a native word for "embedded", instead of a katakanized borrowed word.

(6 comments | Leave a comment)

Monday, May 12th, 2008

John Carmack and Linux VT

Says John:

Our flight computer now has a display screen to show the current status to a pilot. My first inclination was just to mmap the framebuffer and pretend I was back in the days of DOS, but I decided to try and be a good linux programmer and use ncurses. It took me longer than I expected to get it working properly for displaying on the VGA for an application launched from a telnet session, and the performance was very bad. I wound up writing directly to the terminal device myself, spitting out all the escape sequences manually, but it was still quite appallingly slow. I have it working acceptably by only updating the various display items in a scanning fashion to avoid slowing it down on any individual frame, but I should have just followed my first thought and gone with a direct memory mapping.

I'm a little disturbed by the above, because I consider his application essentially equivalent to what Hercules does, and I never saw any performance issues with it. We all know that ncurses is a pig, and of course he should be using Slang instead of ncurses, but since he says that the result was slow even for the raw sequences, certainly this is not the issue. Weird.

It would be awesome if he posted his code somewhere.

UPDATE: John replies in comments:

The flight computer is only a 486-100, so it doesn't take much to bog it down, even with just text writes. I am doing straightforward fwrites and fprintfs to the console tty for everything.

It is at an acceptable rate now, so I probably won't make any other changes, but if RRL decides that they want anything fancy, like scrolling bar graphs, I will go straight to the framebuffer.

(3 comments | Leave a comment)

Saturday, April 19th, 2008

Random dmesg errors

I always was against kernel spewing user-generated errors into dmesg, like this:

npviewer.bin[4393]: segfault at f6712030 ip 67e7a0 sp ff9c39ec error 4 in libpthread-2.8.so[677000+15000]

Not helpful, not interesting.

However, the other day my desktop keeled over in a strange way... The /var/log/messages contained this (followed by a stack trace):

Apr 13 18:19:14 niphredil kernel: Xorg: page allocation failure. order:3, mode:0x4020

It looks like a bug in SLUB (does not seem registering with anyone who has the power to track it down though). But my point is, without the printout I would need to find what was happening by other means, and that would probably take forever.

Hmm... My world is shaken.

P.S. kgdb was merged into 2.6.26. The sky is falling.

(4 comments | Leave a comment)

Thursday, April 17th, 2008

Jon Corbet on Red Hat and Desktop

Seen at LWN today (no permalink — what the heck?):

Red Hat's desktop team has posted an item saying that the company has no plans to offer a "traditional desktop product" anytime soon.

Say what? The referenced item says:

[W]e have no plans to create a traditional desktop product for the consumer market in the foreseeable future.

Umm... RHEL desktop is doing quite well, all we're saying we're not committed to selling it at Best Buy. Not sure how this debacle has happened. Jon was probably short on coffee.

(15 comments | Leave a comment)

Tuesday, April 15th, 2008

Fallback-induced thoughts

I saw two or three bug filings in last couple of months which deal with a USB device not working until ehci_hcd is unloaded. Thinking sensibly, it's rather normal, a poorly-made or poorly-cabled device may choose to report High (480) speed yet will be unable to communicate at that speed. And a couple of devices failing across half a million of users is rare. However, the thing is, such cases were extremely rare before, I don't even remember the last time this happened. So, I'm starting to worry that EHCI hardware or software may have a subtle bug somewhere (perhaps specific silicon percolated to the field).

If only there was a way to tap into Novell's bugzilla and watch their kernel bugs, to collate with ours. Ditto the Bligh's Bugme and Ubuntu's whatever (Launchpad?).

For readily identifiable bugs, we just report them to linux-usb or whatever and then patterns just come together, but the problem of fallback-wannabe devices is too flimsy and vague.

P.S. By "fallback" I mean the new code which switches a port over to a Full (12) speed if enumeration fails. It's a practical solution, but it seems like sweeping the problem under the carpet to me. Also, it won't work for anything that's plugged into a hub.

UPDATE: Amit from Ubuntu pointed to their bug 88746. V.interesting.

(1 comment | Leave a comment)

Monday, April 7th, 2008

-e for Elimination

After noticing that the annoying and useless (for me) orange star has no setting "go away forever", I concluded it was time to use "rpm -e". However, we have another case of House That Jack Built: system-config-printer needs /usr/bin/system-install-packages (because, you know, it wants to pull printer drivers for you automagically). But the system-install-packages is a part of the gnome front-end, not PackageKit itself (why? a mystery), and that includes the orange star (it's called pk-update-icon). Godly.

At least it's not a throbbing red eye, and not restarted when killed. Also, Yet Another Sneaky Daemon They Sprung On My System While I Looked The Other Way (packagkitd) quetly disappears after a while, releasing my precious memory. I sense some good intent here, but it's not good enough.

P.S. I'm testing if "Check for Updates: Never" and then killing means "go away".

P.P.S. Nope, it still restarts on the next login, and checks for updates. What part of "never" is unclear here?

(Leave a comment)

Sunday, April 6th, 2008

Timeouts

I didn't try to burn a CD with ub in a while, because my new laptop comes with a built-in burner. After all the hustling with __blk_end_request, I thought the situation called for a test. This looked worrysome:

Track 01: Total bytes read/written: 548321280/548321280 (267735 sectors).
Errno: 5 (Input/output error), close track/session scsi sendcmd: cmd timeout after 5.000 (480) s
CDB:  5B 00 02 00 00 00 00 00 00 00
cmd finished after 5.000s timeout 480s
cmd finished after 5.000s timeout 480s
wodim: Cannot fixate disk.

The resulting CD was not a coaster though. A welcome surprise, but clearly I did something wrong regarding timeouts, and it needs fixing (although I'm quite sure that there's no other person on Earth who would want to burn CDs with ub).

BTW, the new cdrecord looks nice indeed. Before, I only used the one maintained by that self-centered dude with attitude... No idea who maintains this one, but it seems working ok.

(2 comments | Leave a comment)

Friday, March 21st, 2008

What Would Rusty Say?

One of the many great things Rusty has done was introducing the Misuse Levels of APIs (in OLS 03 keynote, slide 30 and beyond). I had a run-in with something of that nature last week.

Here's an interface:

/**
 * blk_end_request - Helper function for drivers to complete the request.
 * @rq:       the request being processed
 * @error:    0 for success, < 0 for error
 * @nr_bytes: number of bytes to complete
 *
 * Description:
 *     Ends I/O on a number of bytes attached to @rq.
 *     If @rq has leftover, sets it up for the next range of segments.
 *
 * Return:
 *     0 - we are done with this request
 *     1 - still buffers pending for this request
 **/
int blk_end_request(struct request *rq, int error, unsigned int nr_bytes)

What do you think the "number of bytes to complete" is? It seemed natural to me that it's the number of bytes which was transferred (and thus, it can be smaller than the number of bytes remembered in the request). This is how I would design an API. But in this case, nr_bytes is the number of bytes which was in the request initially. As such, it is greater than the request->data_len, which drivers modify to indicate the residue.

I think this has something to do with Tomo's & Jens' desire to avoid modifying drivers which poke ->data_len today (indeed, the code doing so in ub remained unchanged). If so, the price is too steep, IMHO.

Curiously, the designers of the API themselves misused it when they converted ub. They called __blk_end_request() with and argument of blk_rq_bytes(rq), but since ub modifies ->data_len, it guaranteed a failure for packet requests.

Everything seems to be working now, but I suspect that 2.6.25 is going to ship with a broken ub (thank Chris Wright for the Stable Tree).

UPDATE: See also a blog article (same server, but helps if Rusty decides to reshuffle his home directory).

(Leave a comment)

Thursday, March 20th, 2008

Irony of the day

Seen today:

  PID USER      PR  NI  VIRT  RES  S %CPU %MEM    TIME+  COMMAND
23015 root      20   0  311m  63m  S  1.0  7.4 148:50.50 Xorg   
23661 zaitcev   20   0  446m 9324  S  1.0  1.1 117:00.18 gnome-power-man

So, the process which is supposed to save my CPU cycles is responsible for consuming almost as much as X server, which does a heck of a lot of work. Isn't it ironic?

I suspect Gnome Power Manager loses its mind when I close the lid overnight.

(2 comments | Leave a comment)

Tuesday, January 29th, 2008

Today, I hate... who?

Not sure who to hate today. At first I was going to hate Karsten, but I quickly realized that he's actually on the good side, fixing problems rather than causing them. But someone has to be responsible for the abomination known as libtool, right?

It all started when I wanted to run Hercules. I extracted my old images and hercules.cnf, ran "yum install hercules", and then... Hercules starts, but recognizes no devices, starting with the 3505.

The problem is in the so-called "dynamic load": emulators for devices are shared objects, and our stock Hercules on Fedora searches for them everywhere except where necessary:

write(4, "HHCCF065I Hercules: tid=2AD76E5C"..., 72) = 72
open("/hercules/hdt3505.la", O_RDONLY)  = -1 ENOENT
open("/hercules/hdt3505", O_RDONLY)     = -1 ENOENT
open("/lib/hdt3505.la", O_RDONLY)       = -1 ENOENT
open("/usr/lib/hdt3505.la", O_RDONLY)   = -1 ENOENT
open("hdt3505.la", O_RDONLY)            = -1 ENOENT
access("/lib/hdt3505", R_OK)            = -1 ENOENT
access("/usr/lib/hdt3505", R_OK)        = -1 ENOENT
open("/etc/ld.so.cache", O_RDONLY)      = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=84779, ...}) = 0
mmap(NULL, 84779, PROT_READ, MAP_PRIVATE, 10, 0) = 0x2aaaaf55b000
close(10)                               = 0
open("/lib64/tls/hdt3505", O_RDONLY)    = -1 ENOENT
open("/lib64/hdt3505", O_RDONLY)        = -1 ENOENT
open("/usr/lib64/tls/hdt3505", O_RDONLY) = -1 ENOENT
open("/usr/lib64/hdt3505", O_RDONLY)    = -1 ENOENT
munmap(0x2aaaaf55b000, 84779)           = 0
write(4, "HHCCF042E Device type 3505 not r"..., 42) = 42

The real path is /usr/lib64/hercules/hdt3505.so. Did nobody ever test?!

So I download the source, configure with --disable-dynamic-load, build, everything works. After all, the whole /usr/lib64/hercules is only 240KB, who needs dynamic modules anyway? Then, I want to be good and try to build an RPM... It bombs with "ld: undefined hdl_genhdl".

It gets even more involved. Apparently, when I run configure by hand, libtool fails, falls back to linking from .a, and then everything works. But when I build an RPM, libtool succeeds, produces .so, then linking fails because...

So, spent a day trying to understand how libtool worked and why removal of rpath causes it to produce garbage, etc. until it was time to sleep. Today, filed a bug, moved on. Let Matthias to puzzle it out.

UPDATE 20080103: Hans de Goede fixed it.

(Leave a comment)

Friday, January 25th, 2008

Xen in Fedora

UPDATE: Halleluia, Dan announced that SCT got pvops Xen booting on dom0, thus making the entry below moot. Boy I'm glad I didn't waste too much time on this.

--- old entry follows ---

Prompted by the incompatibility between iproute and 2.6.21.7 in Rawhide (e.g. No Network In Xen on Rawhide), I glanced at the Xen on Fedora. Taken charitably, this may be a fresh look by the outsider, or, otherwise, know-nothings blogging about things they have no clue about. That said, the picture is a bit sad.

The official position by Dan (but I think he speaks for SCT and Juan too) is to give up on updating our dom0 kernel, because of "Lots of porting work", and work on new kernel with paravirt_ops, and Xen shim for paravirt_ops. The result is, a hole opens as the new and shiny paravirt_ops slides, and the old and crusty 2.6.21 stagnates. The iproute thing is just the start. Wouldn't it be better to continue to maintain old Xen until the new one comes around? How hard can it be and how much is Dan's "lots"?

Here's what I found. Xen kernel patch weighs about 96KLOC itself, and about 10KLOC in various patches like execshield. Over the week, I moved about 4KLOC, and looked at a few more. So, the way it goes it would take me about a month to have a 2.6.25 based Xen. That's how big "lots" is.

One thing I found though, the lots is far bigger than necessary because of foo-xen.c. It's a trick which allows Xen to shadow and override files, implemented in scripts/Makefile.xen. This trick helps to slap Xen on a distro like RHEL, where it guarantees that the non-Xen would not be affected, because foo.c remains how it was, and foo-xen.c is cloned from it. But the price we're paying for Fedora is enormous. Without it, all we need to do is run patch, fix the rejects. A day's work for a new -rc from Linus (in terms of Fedora, when Chuck or Dave pull a new update into Rawhide).

About 50KLOC of the Xen patch now is the cloned stuff. So, in a way, we caused a large part of "lots" ourselves.

Can I fix it? I think the Xen cabal and the kernel cabal would not mind if I did. The problem is how to clear the very first month. I have normal work to do... I suppose I can ask my manager for a sabbatical for the good of Fedora.

(Leave a comment)

Tuesday, January 22nd, 2008

Today, I hate... Roland!

Actually, I don't really hate him, and he's not at fault, but I did waste a whole day on --build-id. I tried to build some kind of ancient Fedora kernel based on 2.6.21.7 for my own purposes, and it was bailing with: "ld: .tmp_vmlinux1: section `.text' can't be allocated in segment 0". The next line had the key to the problem, but it was too long and thus unreadable on terminal or in vi, and so I missed this: "LOAD: .note.gnu.build-id .text .text.rest_init [240 kilobytes on the single line follow] __param __bug_table". Killing the magic which added --build-id to LDFLAGS_vmlinux fixed the problem. But before that, I learned way more about vmlinux.lds than I ever wanted to know, and examined every object file in the kernel with objdump(1).

(Leave a comment)

Monday, January 21st, 2008

Jujumon

Since I'm on the blogging spree, I'm going to shed another dangling patch.

At OLS two years ago I talked about usbmon, and tried to persuade fellow hackers that it was a great idea and every subsystem should have one. Ergo, scsimon, firewiremon, infinibandmon. Well... two years down the road, and usbmon is the only one.

Actually not quite. When I tinkered with porting Kristian Hogsberg's Juju stack to RHEL 4 and did some other FW hackery in service of the Queen, I regretted many times that firewiremon (then renamed jujumon) did not exist. So I wrote it.

It was very useful for me personally, but unfortunately I was unable to make it upstreamable. Juju (or Firewire) needs way more intercept points than USB, so the hookage becomes unweildy, I was dissatisfied. Stefan Richter and Jay Fenlason would laugh me out of the mailing list if I posted that. Kristian has retired to GNOME by this time, or he'd laugh too. I moved to other things too, so I guess jujumon has to rot at my homepage for now. Dunno if I should be sad or what.

(Leave a comment)

git, the person and the SCM

From Wincent (via apenwarr):

I knew Torvalds was smart, but seeing as I was never really more than an occasional Linux user I never realized just how smart; I'd thought he was just a good programmer who happened to be in the right place at the right time and had a few good ideas.

I learned it a few years ago, when Linux had a particularly thorny bug somewhere in its process management... Something was leaking on a process exit in case of some obscure error and nobody could figure it out. IIRC even Ingo took a stab and failed, and we all know that Ingo is excellent. Eventually Linus lost patience, went in for a day, fixed it. I had an Advogato diary at the time, but did not post anything, only made a mental note. Linus likes to pretend that he is just a project manager, but it's a sham.

BTW:

Git breaks the mould because it thinks about content, not files. It doesn't track renames, it tracks content. And it does so at a whole-tree level. This is a radical departure from most version control systems. It doesn't bother trying to store per-file histories; it instead stores the history at the tree level. When you perform a diff you are comparing two trees, not two files.

I wonder if X people (e.g. Keith) made a wrong move when they split their tree into a bunch of smaller git repositories instead of having one big repository like kernel does. I guess that it has to do with their phylosophy of being modular (remember, they have a stabe module API which kernel lacks), so they do not foresee code sharing (and thus movement) across their numerous modules and corresponding repositories. Maybe it reduces network traffic at their central git server.

(2 comments | Leave a comment)

Sunday, January 20th, 2008

blitz

When I read an entry at Chizumatic, I thought that it would be nice to have a PDA with the dictionary, yet use it for lookups off a PC. If we generalize the problem, all we need is to share clipboard between two systems.

Here's how I do it. First, we write a scriptlet, called blitz:

blitzchunk=$HOME/blitz-chunk
# Use a file for trace / debugging. A pipe is too anonymous.
/bin/cat > "$blitzchunk"
# The xclip places itself into background by default, but does
# not daemonize correctly, so ssh hangs. Therefore, we background
# it by hand from shell.
# The option -quiet makes xclip to run in the foreground.
DISPLAY=:0.0 /usr/bin/xclip -quiet -selection clipboard "$blitzchunk" \
  >/dev/null 2>&1 </dev/null &

Then, if you have a clipboard which you want to share, run: xclip -o | ssh targethost.domain.com blitz (normally you'd have it off a menu in your GNOME, "Share Clipboard to...").

The ssh with public key authentication does the job of an RPC method. Also, you need xclip on both systems.

I have to say, blitz is godsent, although I'm the god. I only wonder if I am reinventing a wheel here...

(Leave a comment)
Previous 20