Thierry Reding
2012-02-14 11:01:39 UTC
Hi,
I'm running a custom distribution that uses an initial ramdisk to get the
system up and running. The system uses udev and systemd and keeps the initial
ramdisk around to make use of systemd's shutdown hook to properly unmount
filesystems on shutdown.
I have to admit that the setup is probably not very common: the device has an
onboard flash with a SATA interface, and one of the partitions contains a
squashfs image that is the actual root filesystem. So in order to properly get
this up and running, the partition is mounted at /media/disk and then
/media/disk/rootfs.img can be mounted as the new root filesystem. To further
complicate things, /media/disk is then moved into the new root filesystem to
keep it available after the system is booted. So there really is a mount
dependency and the root filesystem cannot be unmounted because /media/disk is
still mounted, yet /media/disk cannot be unmounted because it contains
rootfs.img which is mounted as root filesystem.
That's where the shutdown hook comes into play: systemd jumps back to the
initial ramdisk (which is kept around for exactly this purpose) and a script
is run to unmount the filesystems in the proper order. So I first move
/oldroot/media/disk to /media/disk, which allows /oldroot to be unmounted.
Finally I can unmount /media/disk because rootfs.img is no longer mounted.
This all used to work properly until recently when I upgraded udev from 173
to 181. Suddently unmounting /media/disk fails with "device or resource
busy". So I tried to find out where exactly this was introduced and found out
that things work well with 175 but not 176 and later. However I didn't manage
to bisect this further because none of the intermittent commits would yield a
working udev so that either keyboard didn't work and I couldn't shut it down
(networking didn't work either in that case) or the SATA interface wasn't
properly initialized.
Somewhere inbetween I also tried debugging from the kernel side by looking at
debug output from the umount syscall. Apparently the reason is that the usage
count of the /media/disk mountpoint is 3 at the time the shutdown script
tries to unmount it and consequently the umount is rejected.
So I went back to look at the logs to take an educated guess about what
changes might be causing this behaviour. The only potential candidates that
seemed to jump out were the new blkid and kmod builtins. I attempted to rip
out the actual implementation (in effect making them dummy builtins) but that
didn't fix the problem either.
Now I'm pretty much running out of ideas about where to look. Does anybody
else have any ideas?
Thierry
I'm running a custom distribution that uses an initial ramdisk to get the
system up and running. The system uses udev and systemd and keeps the initial
ramdisk around to make use of systemd's shutdown hook to properly unmount
filesystems on shutdown.
I have to admit that the setup is probably not very common: the device has an
onboard flash with a SATA interface, and one of the partitions contains a
squashfs image that is the actual root filesystem. So in order to properly get
this up and running, the partition is mounted at /media/disk and then
/media/disk/rootfs.img can be mounted as the new root filesystem. To further
complicate things, /media/disk is then moved into the new root filesystem to
keep it available after the system is booted. So there really is a mount
dependency and the root filesystem cannot be unmounted because /media/disk is
still mounted, yet /media/disk cannot be unmounted because it contains
rootfs.img which is mounted as root filesystem.
That's where the shutdown hook comes into play: systemd jumps back to the
initial ramdisk (which is kept around for exactly this purpose) and a script
is run to unmount the filesystems in the proper order. So I first move
/oldroot/media/disk to /media/disk, which allows /oldroot to be unmounted.
Finally I can unmount /media/disk because rootfs.img is no longer mounted.
This all used to work properly until recently when I upgraded udev from 173
to 181. Suddently unmounting /media/disk fails with "device or resource
busy". So I tried to find out where exactly this was introduced and found out
that things work well with 175 but not 176 and later. However I didn't manage
to bisect this further because none of the intermittent commits would yield a
working udev so that either keyboard didn't work and I couldn't shut it down
(networking didn't work either in that case) or the SATA interface wasn't
properly initialized.
Somewhere inbetween I also tried debugging from the kernel side by looking at
debug output from the umount syscall. Apparently the reason is that the usage
count of the /media/disk mountpoint is 3 at the time the shutdown script
tries to unmount it and consequently the umount is rejected.
So I went back to look at the logs to take an educated guess about what
changes might be causing this behaviour. The only potential candidates that
seemed to jump out were the new blkid and kmod builtins. I attempted to rip
out the actual implementation (in effect making them dummy builtins) but that
didn't fix the problem either.
Now I'm pretty much running out of ideas about where to look. Does anybody
else have any ideas?
Thierry