Can’t boot 2.6.32-3 after running 2.6.32-5 for a while

OK, I’m sure it must be something simple, but I can’t figure out what.

I am running the “testing” release (Squeeze). I have two stock Debian kernels installed on my system: linux-image-2.6.32-3-686 and linux-image-2.6.32-5-686. My computer has a single hard disk, a traditional ATA IDE drive, also known as “PATA”. One important difference between these two kernels is that the -3 kernel uses the old IDE drivers, which results in my hard disk device name being /dev/hda. The -5 kernel uses the newer libata driver, which results in my hard disk device name being /dev/sda. In other words, the newer kernel uses a SCSI device naming convention instead of the traditional IDE device naming convention.

When I first installed the 2.6.32-5 kernel, it was from unstable, but this kernel has since migrated to testing. I didn’t like all the UUID changes that the dependencies of this kernel package wanted to make and instead changed all my system files (/etc/fstab, /etc/initramfs-tools/conf.d/resume, and /etc/lilo.conf) to use the new SCSI device naming conventions. After some initial issues with the nouveau driver, I have been running the -5 kernel successfully for some time.

Recently, however, for the purpose of testing some software, I decided I wanted to boot my old kernel (the -3 kernel). I knew that I would have to make some changes to the system files first, however. I decided to make use of udev aliases in these system files. For example, in /etc/fstab, instead of something like

/dev/sda1

I used something like

/dev/disk/by-uuid/04db5929-51e6-424a-ac5b-a592b96b9d04

After making changes of this nature to /etc/lilo.conf, /etc/fstab, and /etc/initramfs-tools/conf.d/resume, I rebuilt the initial RAM file systems for both kernels with “update-initramfs -uk all” while running the -5 kernel. Everything appeared to work fine. I then shutdown and rebooted my current kernel (the -5 kernel). It booted just fine. I then tried to boot my old kernel (the -3 kernel). It failed. The kernel and initial RAM file system were loaded just fine by the boot loader, but the -3 kernel couldn’t make the switch between the initial RAM file system and the permanent root file system. I got a few “device not found” error messages and it left me in an ash shell with “(initramfs)” as the boot prompt.

What did I do wrong? Is there a system file that I missed? Is this a missed dependency in the dependency-based boot system? I can still boot the -5 kernel just fine, but I can’t get the -3 kernel to boot. I tried searching the internet with the search words

Debian 2.6.32-3 2.6.32-5 “(initramfs)” “won’t boot”

but none of the hits looked promising to me.

Any ideas? My hunch is that the udev aliases might not yet exist at the time they are being referenced. But that’s just a wild guess at this point. The thing is though, it works fine for the -5 kernel.

Leave a comment

9 Comments.

  1. IIUC, linux-image-2.6.32-3-686 uses hdX and linux-image-2.6.32-5-686 uses sdX so wouldn’t your update-initramfs have updated your linux-image-2.6.32-3-686 initrd with sdX device names?

  2. Stephen Powell

    No. I changed the /dev/sdX device names to uuid udev aliases in /etc/fstab, /etc/lilo.conf, and /etc/initramfs-tools/conf.d/resume prior to issuing “update-initramfs -uk all”. And these alias names should exist in the -3 kernel too. As an example, I changed

    /dev/sda1

    to something like

    /dev/disk/by-uuid/04db5929-51e6-424a-ac5b-a592b96b9d04

    udev under the -5 kernel creates a symbolic link by this name to /dev/sda1. udev under the -3 kernel creates a symbolic link by this name to /dev/hda1. Or at least it should.

  3. So, all entries in “/etc/lilo.conf” are pointing to “uuid” devices or just the ones loading the “-3″ kernel?

    Greetings,

  4. Stephen Powell

    There are only two lines in /etc/lilo.conf that are relevant: boot and root. Both are in the global section, not in a per-image section. And since I have lilo’s first stage loader installed in the boot sector of a primary partition that has been formatted with a Linux file system that assigns a UUID, the partition has a /dev/disk/by-uuid alias. (If lilo were installed in the master boot record, I couldn’t do this, since the master boot record does not have a /dev/disk/by-uuid alias.) So yes, “all the entries” are pointing to “uuid” devices.

  5. (…)

    How is that? Can’t LiLo manage a multiboot menu or is a personal configuration? :-?

    I ask because I never used LiLo but GRUB, and here is quite normal to have several lines/sections allowing the user to boot the desired kernel/ system.

    Can you copy/paste the relevant “lilo.conf” boot entries you are using for both, the ones that boot “-3″ kernel and the ones that boot “-5″ kernel?

    Greetings,

  6. Theoretically. But don’t you think that your -3 initrd is creating links to sda1, etc because you built it while booted into -5?

    You should unpack your -3 initrd to check its udev rules; symlinks to hda1, etc are probably not being created.

  7. One more thought.

    Rather than unpack the initrd, boot into -3 and run “cd /dev; ls” at the (initramfs) prompt. If udev has run and your initrd has bailed out at the equivalent of “break=mount”, you should be able to see whether there are sda1 or hda1 type partitions.

  8. Stephen Powell

    Of course it can. But the only thing that differs between the two kernels is the kernel image and its corresponding initial RAM file system image. They share the same permanent root file system.

    The same is true of lilo.

    If you want the exact configuration file, that will have to wait for about seven hours, until I have access to that machine again. But perhaps you will be satisfied with an approximation for now. Here is an approximation of what it looks like based on another machine. The uuids used in this example are from this substitute machine:

  9. Mmmm, understood.

    But it should be easier to configure two different root filesystsems to try with different setups, just in this case and just for testing.

    For example (quick and dirty sample):

    *** boot=/dev/disk/by-uuid/04db5929-51e6-424a-ac5b-a592b96b9d04 ### LILO -5 kernel image=/boot/vmlinuz label=Linux root=/dev/disk/by-uuid/04db5929-51e6-424a-ac5b-a592b96b9d04 initrd=/boot/initrd.img # ### LILO -3 kernel image=/boot/vmlinuz.old label=LinuxOld root=/dev/sda2 initrd=/boot/initrd.img.old optional ***

    Or:

    *** boot=/dev/sda2 ### LILO -5 kernel image=/boot/vmlinuz label=Linux root=/dev/disk/by-uuid/04db5929-51e6-424a-ac5b-a592b96b9d04 initrd=/boot/initrd.img # ### LILO -3 kernel image=/boot/vmlinuz.old label=LinuxOld root=/dev/disk/by-uuid/04db5929-51e6-424a-ac5b-a592b96b9d04 initrd=/boot/initrd.img.old optional ***

    And check if booting the old kernel in any of that way works.

    (…)

    I do not know LILO at all, but GRUB can fail with “device not found” when has problems for locating the root device node. And I’m not sure at what extent a mix of both (GRUB and kernel versions) have support for “ID” or “UUID” naming :-?

    I’m interested in seeing how your testing goes.

    Greetings,

Leave a Reply