The hald problem received more attention, but actually it was an easy one. I should have sent a fix couple of weeks ago, but I all this time I hoped to fix the Oops RSN and send both. The Oops was caused by an attempt to call put_disk from ub_disconnect(), which pulled the rug from under the upper layers. My first reaction was to move it to ub_cleanup, which is a refcounted tail destructor, similar to scsi_disk_put. However, immediately I started to doubt that it was sufficiently bulletproof. The ub_cleanup is called from the release method, right? So, the release is not done at the time yet, structures are still in place. How doing put_disk there is any better than doing it from a disconnect? The whole construct rides on an assumption that nobody tries to do anything to the device (like an open) between the return of the release method and the teardown of all structures in the block device layer. And I just did not trust that to work and be race free.
Instead of doing the obvious, I sat and scoped an implementation where disk was never released, ever. Just its geometry was changing when media was pulled or devices were disconnected. Rock solid. However, the result was messy. First, I had to create states for the whole device, with a lot of confusion over what states are needed and what to do with them. Second, initialization had to be refactored into sendmail-ish mess, because now sometimes I received a fresh ub_dev with no disk, and sometimes a previously owned one with disk preallocated. And third, I had to copy the code to force a partition rescan from dasd.c. When I did that I understood that I went too far, quickly moved put_disk to ub_cleanup and sent a patch to Greg.
The result was two weeks of angry Rawhide users. Sometimes I'm wondering what is wrong with me? Maybe I should start a second career in real estate.