IRC channel logs

2024-01-16.log

back to list of logs

<lrvick>After days of debugging, the final result is pretty simple, as is tradition. Working deterministic docker builds: https://github.com/fosslinux/live-bootstrap/pull/413
<fossy>lrvick: that's how things often tend to go around here, we are working with many moving parts :P
<fossy>thanks for your PR. I'll take a look
<lrvick>Now in my own setup, I am trying to build stage0 by itself using a 3 way distro compare setup I can only easily do in docker, then trying to drop that into the target, and run "/x86/bin/kaem --verbose --strict --file ./after.kaem" on my own. It explodes as follows: https://dpaste.org/aTU6j/raw
<lrvick>Any debugging advice welcome. Here is the dockerfile where I create the "target" dir by hand: https://git.distrust.co/public/packages/src/branch/main/src/bootstrap/live-bootstrap/Dockerfile
<lrvick>I don't think I have had to debug script generator yet. Probably something obvious
<fossy>can you link your steps/manifest? is it the same one in your git repo?
<fossy>(is it changed at all)
<lrvick>it is not changed at all.
<lrvick>I only touch finalize_hs.sh in steps (your patch)
<fossy>oh, script-generator expects /steps/lwext4-1.0.0-lb1/files/fiwix-file-list.txt to exist, which is made by rootfs.py usually, but that's not needed.
<fossy>i'll push a fix in the moment. in the meantime, i'd suggest just touching that file, even though you don't need it
<lrvick>indeed, that got me further. got me to my first missing distfile.
<lrvick>oh, it expectes them in /external
<lrvick>easy enough
<lrvick>stuff is deep in compile land now
<lrvick>ACTION lets it cook
<fossy>hopefully it should be fairly straightforward from here
<fossy>i have fairly recently done a bwrap bootstrap without rootfs.py
<muurkha>some notes on how 86-DOS (aka QDOS, MS-DOS) was bootstrapped in 01980: http://www.os2museum.com/wp/86-dos-revisited/
<muurkha>I don't remember what TRANS was.
<muurkha>and naturally when I search for information I get prostitute ads
<muurkha>aha, it translated Z80 code to 8086. https://www.tomshardware.com/software/operating-systems/oldest-known-version-of-dos-unearthed-ms-dos-ancestor-86-dos-version-011-is-now-available-on-the-internet-archive
<pder>I was wondering if anyone had any other ideas on how to troubleshoot Fiwix failing to boot in live-bootstrap. The one thing that has worked for me is disabling kvm, but that is extremely slow.
<pder>Currently with latest live-bootstrap master, it stops at jumping to trampoline...
<rickmasters>pder: I think there were two lines of debugging to pursue.
<rickmasters>pder: One is that you have a different memory map that was causing problems.
<rickmasters>pder: This is hard coded (which is a hack of mine) here: https://github.com/fosslinux/live-bootstrap/blob/490bc621a53b46b1ba57fe09921e0bc1ddaef789/steps/kexec-fiwix-1.0/src/kexec-fiwix.c#L129
<rickmasters>You could try altering that code to match your memory map. I know you posted your map a few days ago but I haven't look at it closely to see if it was likely a problem.
<rickmasters>Note that if you alter kexec-fiwix then you'll want to run with --update-checksums to avoid a checksum error
<rickmasters>pder: The other line of debugging was to trace into Fiwix to see exactly how far it is getting in the boot process.
<rickmasters>pder: I suggested before using __asm(".byte 0xEB 0xFE"); to see if you reach that statement.
<rickmasters>pder: However that presumes you know how to alter Fiwix, create a test package, detect the spin loop, and so forth.
<rickmasters>pder: I didn't hear back on that so I'm wondering if you need pointers on that.
<rickmasters>By the way, the kexec-fiwix.c file is in steps/kexec-fiwix-1.0/src/kexec-fiwix.c
<rickmasters>A stab in the dark is to remove the line '-machine', 'kernel-irqchip=split', from rootfs.py
<rickmasters>I found that I needed to add that option to qemu to make kvm work but I never figured out why. Maybe you have the opposite problem (although I doubt it).
<rickmasters>More context: I determined that serial port interrupts were not firing in Fiwix with kvm enabled and so I got no output on boot up.
<rickmasters>This only happened when Fiwix was launched by builder-hex0. When Fiwix is launched as the first kernel the serial port worked fine.
<pder>rickmasters: The Fiwix booting problem has evolved for me a bit. Back then the .byte 0xEB 0xFE did work for me but then it was getting further into the Fiwix boot. Now I don't see any boot messages from Fiwix
<pder>I think with Fiwix 1.5 I dont see any boot messages from Fiwix.
<pder>I should also add, that everything works fine with kvm on a 10 year old laptop, but with a relatively new machine I am getting stuck at the point Fiwix boots
<rickmasters>Well, you can still use the spin loop trick to see if Fiwix is running any code at all. You could put .byte 0xEB 0xFE right here:
<rickmasters> https://github.com/mikaku/Fiwix/blob/10014c5cb31edc7ea5bef3e9a13a675da618b99b/kernel/boot.S#L90
<rickmasters>or:
<rickmasters>spinloop:
<rickmasters> jmp spinloop
<pder>Thank you, I will give these ideas a try. I wonder if I am the only one encountering these issues
<rickmasters>pder: Sure. I understand you have a 64GB machine. I tried to spin up and test an AWS 64GB bare metal machine but they don't offer that configuration.
<pder>I did try booting my machine with mem=8G as a linux kernel parameter to limit the amount of available memory but that made no difference with the Fiwix booting issue
<pder>I at least have one machine that works, so I can try to compare the output of both machines
<pder>I wondered also the order that builder-hex0 loads the files is not deterministic. Could that have any effect?
<pder>Actually it is the creation of the hard disk image in which the files may be in different orders
<rickmasters>pder: that's not likely a problem.
<rickmasters>pder: The fact that it works without kvm sounds important
<pder>I did get slightly different e820 output when booting a 32 bit linux kernel both with kvm enabled and disabled
<pder>I also ran memtest86 all weekend without errors
<rickmasters>pder: If you're not comfortable with it, I could try constructing a kexec-fiwix.c with the memory map you posted. I assume that was with kvm?
<pder>I actually posted both memory maps without kvm followed by with kvm
<rickmasters>darn, the pastebin seems to not be there
<pder>One sec, I should have the local file
<pder> https://paste.debian.net/1304321/
<rickmasters>there are a couple of improvements to the map that are worth trying but I don't see anything obviously different
<rickmasters>The first range starts like:
<rickmasters>pmultiboot_memory_map->addr = 0x00000000;
<rickmasters>we could try:
<rickmasters>pmultiboot_memory_map->addr = 0x00001000;
<rickmasters>Linux seems to exclude the first page so maybe we need to.
<rickmasters>There is a line below with this:
<rickmasters> pmultiboot_memory_map->len = 0xBC000000;
<rickmasters>I think it should be:
<rickmasters> pmultiboot_memory_map->len = 0xBBF00000;
<rickmasters>I think the previous value was specifying the ending address, not the length, so it was wrong.
<pder>I see
<pder>I can try changing those two lines and report back
<rickmasters>ok
<pder>Changes made and I started a run
<rickmasters>ok don't forget --update-checksums
<rickmasters>I'm also testing. Those fixes seem like a good idea regardless
<pder>I did run with --update-checksums but same issue. It stops at kexec-fiwix: jumping to trampoline..
<rickmasters>Well it looks like the memory map provided should work with your machine.
<rickmasters>We can't be 100% sure until builder-hex0 is getting the 820 map from the BIOS but my guess is that won't make a difference.
<rickmasters>The most straightforward direction is to start instrumenting Fiwix with the spin loops. I can explore some other options as well.
<rickmasters>Fiwix can output to the QEMU debug port but I'd have to figure out how to make that work and it may not help...
<pder>rickmasters: thanks for your help, I will continue to dig into this when I have some more time
<rickmasters>pder: ok, sorry you're having so much trouble. I hope we figure it out.
<Mikaku>rickmasters: enable CONFIG_QEMU_DEBUGCON, build a new Fiwix kernel and then add the parameter '-debugcon stdio' to QEMU
<rickmasters>Mikaku: thanks
<rickmasters>trying it now
<rickmasters>pder: also, if you're comfortable providing access to your machine I'm willing to try debugging it remotely. No worries if you'd rather not.