Description of problem: After a fresh install of Fedora 24 on bare metal using the latest live/boot image, the system hangs even before displaying the boot loader menu. Version-Release number of selected component (if applicable): anaconda-24.13-1.fc24 How reproducible: Always Steps to Reproduce: 1. Perform a fresh install of current Fedora 24 on bare metal. 2. Reboot system. Actual results: System hangs after displaying a blinking cursor. Expected results: System boots into freshly installed system. Additional info: - Issue affects a Lenovo ThinkPad T400 with a boot partition in /dev/sda1 and an LVM partition in /dev/sda2. After installing Fedora 24 using an older live image from 2016-02-25, the system is bootable and remains to be after updating to current Fedora 24; probably a problem with dracut. - Issue does not appear when current Fedora 24 is being installed in a virtual machine.
Is the system booted via UEFI? Please attach the logs from /var/log/anaconda to this bug as individual, text/plain attachments.
Created attachment 1137077 [details] anaconda.log
Created attachment 1137078 [details] dnf.log
Created attachment 1137079 [details] dnf.rpm.log
Created attachment 1137080 [details] ifcfg.log
Created attachment 1137082 [details] journal.log
Created attachment 1137083 [details] ks-script-8zgpnsq7.log
Created attachment 1137084 [details] ks-script-fghrbqqu.log
Created attachment 1137085 [details] ks-script-frdy5c3e.log
Created attachment 1137087 [details] lvm.log
Created attachment 1137088 [details] packaging.log
Created attachment 1137090 [details] program.log
Created attachment 1137091 [details] storage.log
The Lenovo ThinkPad T400 is a normal BIOS based system.
1. This issue also affects Fedora-Workstation-24_Alpha-1.5. 2. Downgrading to grub2-2.02-0.24.fc24 restores normal behaviour.
A bare metal install from the Alpha 1.5 KDE live works fine for me.
(In reply to Adam Williamson from comment #16) Installing Fedora 24 in a virtual machine with an empty disk works as expected. On my Lenovo ThinkPad T400 with a single 160 GB S-ATA hard drive I do reuse (the only two existing) partitions /dev/sda1 for /boot and /dev/sda2 for an LVM partition. In the latter case and installing the latest Fedora 24 Alpha 1.6, the new system hangs with a blinking cursor before displaying the boot menu unless I do replace grub2-2.02-0.26.fc24 by dumping the content of package grub2-2.02-0.24.fc24 into /mnt/sysimage -before- anaconda writes the boot loader to the disk.
That disqualifies this as an Alpha blocker, then, as re-using partitions is only in the Beta criteria...
Discussed at today's blocker review meeting [1]. Voted as punt (delay decision) - this is certainly potentially a blocker bug, but currently single-sourced and slightly vague, we can make a decision only with more information and tests [1] http://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-03-21
I have a similar failure on a Dell T1600 with an install from a USB stick. It hangs with a blinking cursor immediately on boot. Using an F23 recovery disk and doing a grub2-install from there produces a bootable system.
I have additional failures on Dell T1500's. The same resolution usually works. There are, however, circumstances where the rescue system reports that there are no Linux partitions on the hard disk. In this circumstance, the grub2-install trick does not work. This occurs at seeming random with re-installation. I do not have the problem on older Dell hardware, such as Optiplex 960. The SATA controllers giving trouble appear this way with lspci: 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 04) for Dell T1600 and 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06) for Dell T1500.
Same problem on Thinkpad x200, with both netinstall and live workstation x64 images from 22nd of March (.4). I will try the workarounds suggested.
Additional comments: Here's my sysinfo: http://paste.fedoraproject.org/345495/59012768/
For the record, the initial date delta (02-25 to 03-15) and the fact that downgrading grub2 seems to solve the problem strongly indicates that grub2-2.02-0.26.fc24 , changelog "Rebased to newer upstream (grub-2.02-beta3) for fedora-24", is the cause here - something in that code bump causes the problem. Not sure how much change there is to look through there, we may need to do some bisecting.
(In reply to Joachim Frieben from comment #15) > 1. This issue also affects Fedora-Workstation-24_Alpha-1.5. > 2. Downgrading to grub2-2.02-0.24.fc24 restores normal behaviour. -1 for blocker +1 for freeze exception Reason: We know a doable workaround (downgrade grub2) and can document that for affected users.
It's not a very easy workaround to do, though.
I'm +1 Beta blocker here. Seems like a pretty clear violation of "A system installed without a graphical package set must boot to a state where it is possible to log in through at least one of the default virtual consoles." (among other assorted failures)
Discussed at today's blocker review meeting [1]. Voted as punt (delay decision) - we're generally inclined to take this as a Beta or Final blocker, but we'd like to try and gather some more precise data to be sure how wide the impact is first [1] http://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-03-29
For me the documented workaround (rescue mode, grub install) does not work. I've used F23 x64 netinstall. Unfortunately i was not able to try more times, in the following attempts it failed to found the F24 system.
I'm still looking for a completely reproducible failure. The rescue system sometimes, but not always, reports that there are no Linux partitions in the installed system when I tell it "1-Continue". In this circumstance, the lvdisplay shows the logical volumes as NOT AVAILABLE. This varies between the home, root and swap LVs, usually more than one. I've tried waiting, issuing "sync" and various forms of terminating the actual installation thinking something about the LVM stuff wasn't working. However, the same problem occurs when I use the "Standard Disk Layout" instead. If I ignore that error (or didn't check), I find that I can do what the rescue environment would have done: I mount the root LV on /mnt/sysimage, then the boot partition over /mnt/sysimage/boot, then the home LV over /mnt/sysimage/home. All of this with the F23 Server DVD installation. At this point, I do grub2-install --root-directory=/mnt/sysimage /dev/sdX (It's almost always /dev/sda...) from the F23 rescue environment. So far, this has always produced a bootable system.
For the purposes of this bug I don't think we care about the issues with the rescue image being able or not able to recognize the installed system. This bug seems to quite clearly indicate that grub2-2.02-0.26.fc24 fails very early in the boot process on some systems where earlier grub2 packages worked fine. That is what we are tracking here and what may constitute a blocker bug. That is what we need to focus on testing (i.e. isolating as closely as possible what change caused the problem and what systems are affected).
This is correct Adam. Still i want to thank Robert for providing this details. @Robert, if this can be reproduced in F24 rescue mode, please be so kind and create a new ticket for it.
It's 2 years between beta2 and beta3, per upstream's summary, with hundreds of insertions/deletions. http://git.savannah.gnu.org/cgit/grub.git Off chance this is enlightening if someone with an affected system can boot alternate media, find /boot/grub2/grub.cfg and add 'set debug=all' right under the 'set pager=1' (line 9 in my grub.cfg), save, then reboot. The debug info will be really verbose, for now try pressing spacebar to keep scrolling until it doesn't work anymore, that may give a clue what/why it's hanging there. Alternatives are to build from upstream git, and see if it reproduces the problem, then at least we know if the problem is upstream or Fedora patches.
(In reply to Chris Murphy from comment #33) >try pressing spacebar to keep scrolling until it doesn't work anymore And then take a cell phone photo and attach to this bug.
I have an affected system. I did as you asked. It's still just a flashing cursor. No text appears at all. This is very early in the boot process. Other things you want me to try? I've dumped the boot sector from a working (old grub) and non-working (new grub) system. There's 96 bytes of zeros in the non-working one. If the working one, there's less than 16. I haven't disassembled it yet. Of course, there may be improvements in what goes into the MBR.
(In reply to Robert Knight from comment #35) > I've dumped the boot sector from a working (old grub) and non-working (new > grub) system. There's 96 bytes of zeros in the non-working one. If the > working one, there's less than 16. I haven't disassembled it yet. Of > course, there may be improvements in what goes into the MBR. Can you attach the file created by 'dd if=/dev/sdX of=mbr.bin count=1' and then also the /var/log/anaconda/program.log? sdX is the F24 install target.
Created attachment 1141424 [details] mbr.bin from affected system
Created attachment 1141425 [details] program.log from affected system
Here you go. As a further experiment, I copied the "good" MBR to the affected system, wrote it back as the MBR, and rebooted. The system began to show the debugging output. With enough space bar action, it's actually up.
(In reply to Robert Knight from comment #37) > Created attachment 1141424 [details] > mbr.bin from affected system Looks the same as the jump code (boot.img) I've got in a working VM. So I guess until we hear from pjones we need to look for changes to the MBR code. The computer's firmware is up to date?
Firmware is up to date. I find it's the same MBR code in the working Dell Optiplex system (F24), too.
grub2-2.02-0.26.fc24 produces this gap in MBR as Robert mentions earlier: 00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001b0 00 00 00 00 00 00 00 00 9e 32 94 59 00 00 80 20 |.........2.Y... | Building from upstream tag 2.02-beta3, shows no gap and substantial code difference starting at offset 0x8E fully until 0x177. But since I don't have an affected system, I can't test if this different code works or not. But if it does work then it seems there's a Fedora specific patch causing this problem.
@Chris Murphy: I would be glad to perform tests. Perhaps connecting offline rather than having all of our exchanges in the ticket would make more sense.
In looking at the source for grub2 that I believe is used (grub2-2.02-0.26.fc24.src.rpm), the code for the boot block (grub-2.02~beta3/grub-core/boot/i386/pc/boot.S) has conditional assembly for a "TPM" compile time parameter that removes the code missing from non-working boot block. This is #define'd near the top of the same source file. It looks like this was added by the 0080-Add-BIOS-boot-measurement.patch .
it should be pretty easy to do a grub2 build with just that patch dropped out - could you do that and see if it works? if you'd like me to do it for you I can, but it's not rocket science...
Sure, I'll do it. The machines that don't work are at the office, so it won't happen until tomorrow, but I've already got a grub2 built without that patch (hack spec and run mock, as you say, not rocket science). OK to test by doing a grub2-install with the replacement grub2 on a F24 installed system or do you want an insert into an anaconda image as proof-of-concept?
grub2-install would be a good enough indicator, I think, especially if you've already verified that that fails with the current grub2 package.
I have indeed verified that the current grub2 package produced an unbootable system with the same symptoms. A grub2 built without that patch allows the system that didn't work before to now be bootable as well as an F24 workstation that was working (on different hardware) to continue to be bootable.
Sounds pretty conclusive! Thanks a lot for digging this out. I'll poke pjones about it if he doesn't respond here, but you could also file an upstream issue now it's precisely identified?
I don't see that code in git clone that I did from what I believe is upstream. Aren't those patches from Fedora? Or am I just confused about what upstream is. I got the source from git://git.savannah.gnu.org/grub.git.
oop, my bad - I was blindly assuming the patch was an upstream backport (grub2 patches usually are). If not, though, the spec should have some kind of indication of its origin and purpose. Lemme look into it quickly.
so, that patch looks like it comes from Matthew Garrett...you can find it in http://github.com/coreos/grub : http://github.com/coreos/grub/commit/1e32d63145bd1eab33da7866dc34eb8d246ba212 which is, I guess, CoreOS's grub? Reading between the lines I get the impression we are treating that as our upstream lately. I'll have to chat to Peter and Matthew about it. Anyhow, I see a couple of rather interesting commits later in that repo which have not yet made it downstream to us: http://github.com/coreos/grub/commit/c2eee36ec08f8ed0cd25b8030276347680be4843 "Fix boot when there's no TPM" http://github.com/coreos/grub/commit/bb3473d7c8741ad5ef7cf8aafbbcf094df08bfc9 "Rework TPM measurements" could you perhaps try building with those and see what happens? I'd suggest trying with just the first, then with both, and reporting the result in both cases.
Ah yes that version of the patchset will be broken on any systems that export TPM support in the BIOS but don't have an enabled TPM. c2eee36 should fix that. I would strongly recommend pulling in bb3473d7c8741ad5ef7cf8aafbbcf094df08bfc9 and aab446306b8a78c741e229861c4988738cfc6426 as well, that way we'll maintain consistency of event log and PCR use.
How is the Fedora sytem boot (supposed to be) related to TPM?
Grub measures boot components into the TPM
Thank you. Then we can access this from the booted system?
Yes, /sys/class/tpm/tpm0/device/pcrs and /sys/kernel/security/tpm0
Thank you.
@Adam Williamson: I'm afraid that exotic a build is beyond my present skills. I've cloned the original, branched on the commit that Matthew says we need to start from and checked out the two commits he strongly recommended. Gluing that into the spec file, the build does not work. The errors have to do with the other patches that are also included in the spec file. I have no experience with git format-patch. Since we know what configuration probably causes this, it might be faster to just configure a system closer to home base. I've tried to get there before your blocker review meeting, but didn't make it.
*** Bug 1323488 has been marked as a duplicate of this bug. ***
Robert: no problem, I'll throw together a scratch build for you shortly.
Discussed at 2016-04-04 blocker review meeting: [1]. This bug was accepted as Beta blocker: this is a conditional violation of "A system installed without a graphical package set must boot to a state where it is possible to log in through at least one of the default virtual consoles." and from the info gathered it seems significant enough of a violation to accept as a blocker [1] http://meetbot.fedoraproject.org/fedora-blocker-review/2016-04-04/f24-blocker-review.2016-04-04-16.05.html
http://koji.fedoraproject.org/koji/taskinfo?taskID=13566544 - grub2-2.02-0.27.1.aw.fc24 - has just "Fix boot when there's no TPM" applied. http://koji.fedoraproject.org/koji/taskinfo?taskID=13566546 - grub2-2.02-0.27.2.aw.fc24 - also has "Rework TPM measurements" applied. I tried to apply the full series of patches up to that point also, as a .3.aw, but it looks like they don't all backport cleanly. Looks like pjones ultimately pulls from http://github.com/vathpela/grub2-fedora , where he rediffs things if necessary. I'll just leave it at those two builds for now. Can you please try those and see how they do? Thanks!
I've repeated the test mentioned in #47 (just do a grub2-install). Both of the versions from koji still boot without trouble. Both of the pseudo files mentioned in #57 exist. The first is empty. The second produces an I/O error (trying to access the ascii_bios_measurements component).
Joachim: awesome, thanks for testing! Robert, if you want to confirm, the 'real' build is http://koji.fedoraproject.org/koji/buildinfo?buildID=751409 , it's pretty similar to the scratch builds I had you test. Peter, could you create an update? If you're too busy let me know and I can do it.
I can confirm that the system is still bootable after using the grub2-install the grub2 and grub2-tools rpms from that build.
Early adopters may add http://frieben.fedorapeople.org/testing as an additional channel in their network install in order to get the latest grub2-2.02-0.28.fc24.
I can additionally confirm that an alpha installation with that channel added does now work on a previously failing system.
Yay! Thanks for confirming *all the things* :)
grub2-2.02-0.28.fc24 has been submitted as an update to Fedora 24. http://bodhi.fedoraproject.org/updates/FEDORA-2016-aa14d8a1b9
Ah, pjones filed an update, just forgot to mark it as fixing this bug. Fortunately I seem to have the powahz.
(In reply to Adam Williamson from comment #63) > http://koji.fedoraproject.org/koji/taskinfo?taskID=13566544 - > grub2-2.02-0.27.1.aw.fc24 - has just "Fix boot when there's no TPM" applied. > > http://koji.fedoraproject.org/koji/taskinfo?taskID=13566546 - > grub2-2.02-0.27.2.aw.fc24 - also has "Rework TPM measurements" applied. (In reply to Fedora Update System from comment #70) > grub2-2.02-0.28.fc24 has been submitted as an update to Fedora 24. > http://bodhi.fedoraproject.org/updates/FEDORA-2016-aa14d8a1b9 all three work for me
grub2-2.02-0.28.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
Can you also rebuild iso image, please? http://kojipkgs.fedoraproject.org/compose/24/latest-Fedora-/compose/Workstation/x86_64/iso/
(In reply to Mikhail from comment #74) A network install should pull in the latest grub2 package; no updated boot image is required. A rebuilt image is solely needed for the live image when installing from the live media is intended.
ISO rebuilds happen automatically.