IRC channel logs

2023-12-08.log

back to list of logs

<stikonas>hmm, so somehow EDK2 on my machine gets already confused by just executing mov rbp, rsp followed by ret
<stikonas>perhaps it expects both rbp and rsp preserved...
<oriansj>stikonas: is EDK2 touchy about rbp? sounds like an exploitable defect.
<stikonas>oriansj: I think so
<stikonas>well, EDK2 is touchy about a lot of things...
<stikonas>it really doesn't do any sanity checking
<stikonas>oriansj: or maybe it's better to say that some variants of edk2 are touchy about rbp...
<oriansj>I guess I shouldn't be surprised; basic sanity checking would be basic good practice when programming.
<stikonas>cause it did work in QEMU...
<stikonas>and it works on 8 year old laptop...
<stikonas>but somehow coreboot with newer EDK2 payload fails...
<oriansj>so software quality goes down in direct relation to focus on new features?
<stikonas>maybe...
<stikonas>well, I don't think I have more features in it...
<stikonas>but anyway...
<stikonas>everything points that I need to preserve rbp too
<stikonas>which is a bit annoying
<stikonas>as I was using it to save original value of rsp
<stikonas>well, given that I mostly track rsp in each function anyway
<stikonas>I could probably rewrite it to work
<stikonas>(without changing rbp)
<stikonas>well, maybe it was my mistake
<stikonas>calling convention says
<stikonas>The registers RBX, RBP, RDI, RSI, R12, R13, R14, R15, and XMM6-XMM15 are considered nonvolatile and must be saved and restored by a function that uses them.
<oriansj>so over half the registers are reserved? wow
<oriansj>they could have just had a fixed memory address block which had the UEFI stack and a pointer to it and a word to store rsp
<stikonas>anyway, I think my hex0.efi is running for longer now
<stikonas>(as I'm going over it and making sure rsp is always poped back after push)
<stikonas>hmm, something more complicated is going on, it's not just rbp or other non-volatile registers not being saved...
<stikonas>in fact the problem is that my first read call does not return
<stikonas>hmm, it's even more complicated...
<stikonas>there are probably 2 issues, one with rbp, the other is with read...
<Googulator>I'd suggest (on amd64) building a 32-bit native compiler & a 32->64-bit cross compiler in Fiwix, and then using the cross compiler to make the kernel
<Googulator>(in the 2nd GCC 4.0.4 build, that is)
<Googulator>& then, we boot into a 64-bit Linux kernel with a mostly 32-bit userspace, and then recompile everything that's still relevant to 64-bit
<Googulator>that way, no need to make a 64-bit version of Fiwix
<Googulator>as for 64-bit UEFIs, I'd go with a 32-bit builder-hex0 that switches to 64-bit for UEFI calls, similar to how it switches to 16-bit for BIOS calls (I hope x86 allows this)
<Googulator>of course, RISC-V (& ARM, which I'd love to see happen) is a whole different can of worms
<oriansj>and we are still feeling out ways of doing things. Lots of learning still to go
<Googulator>the big issue there for moving beyond POSIX bootstrap is hardware/platform description
<Googulator>there are 2 competing platform description languages there: DTB (which is owned by the Linux kernel, and which maintainers not only refuse to properly standardize/stabilize, but actively sabotage attempts to do so, see the situation around ARM EBBR) and ACPI (which involves running unaudited 3rd party bytecode inside the OS - not great for trusted
<Googulator>bootstrapping -, and is probably really difficult to bootstrap as well)
<Googulator>Hopefully bootstrap can be the driver for an effort to make a properly standardized and versioned DTB
<Googulator>In other news, the bootstrap flash drive passed H2testw, and appears to be healthy - no idea why it just stalled out in the "making swap" step
<fossy>that is a way of doing an amd64 bootstrap - and is also possible at the end of the x86 bootstrap chain
<fossy>an amd64 *native* bootstrap would also be nice though, especially for the y2038 problem
<matrix_bridge><Andrius Štikonas> Yeah, amd64 native bootstrap would be nice for y2038...
<matrix_bridge><Andrius Štikonas> Perhaps we can workaround it in other ways (backport 64 bit time_t to older musl...)
<matrix_bridge><Andrius Štikonas> But still, making bootstrap multi-arch generally helps with finding and understanding current bugs
<matrix_bridge><Jeremiah Orians> That only works if the kernel also supports 64bit time syscalls but we do have 14 years to sort that
<rickmasters>Googulator: I'm working on your PR for builder-hex0. I'm going to need to work through troubles.
<rickmasters>Googulator: Running make in your builder-hex0 branch no longer works - builder-hex0.hex0 fails building itself in qemu.
<matrix_bridge><Andrius Štikonas> Jeremiah Orians: oh indeed... Fiwix probably doesn't support that yet
<rickmasters>Googulator: This is the "monolithic" variation (no stage1) that loads at 7C00.
<rickmasters>Googulator: I'm trying to debug it. It looks like the kernel size increased which requires increasing starting read sector in internalshell.
<Googulator>Weird, I'm pretty sure I did increase that to sector #8
<Googulator>(LBA, so equivalent to CHS 0/0/9)
<rickmasters>Googulator: Oh, maybe thats not the problem. I increased it and it didn't work - so probably something else.
<Googulator>Is 55AA in the right place?
<Googulator>I struggled a lot with that, and maybe I messed up.
<rickmasters>Googulator: yes, 55 AA appears in right place
<Googulator>booting it in qemu as a HDD image?
<rickmasters>Googulator: I'm surprised you got as far as you did. My documentation was poor. Wasn't expecting any help...
<Googulator>It probably won't work when booted as a floppy image, unless SeaBIOS is really lenient
<rickmasters>Googlulator: yes, hdd image - just running `make`
<rickmasters>Googulator: just cloned your repo, git checkout hardware-compat;make
<rickmasters>Googulator: Oh, also changed 3584 to 4096 in the Makefile
<rickmasters>Googulator: I can probably figure it out but just wanted to let you know the reason for delay.
<Googulator>B8 06 00                # mov ax, 6        ; num_sectors = 6
<Googulator>this is probably it
<Googulator>needs to be 7ú
<Googulator>7
<rickmasters>Googulator: and the starting sector 1 -> 2 ?
<Googulator>1 is correct
<Googulator>because it's LBA
<Googulator>LBA counts from 0
<Googulator>MBR is LBA #0
<rickmasters>ok. Just changing 6 -> 7, not working yet
<Googulator>did some debugging, and what I see really doesn't make sense
<Googulator>I can see (E)IP point to "call resume_32bit_mode"
<Googulator>& then I "step into" - and it ends up 2 bytes past the beginning of that "call" instruction, at which point if you disassemble, you get nonsense
<Googulator>did we somehow trigger a qemu bug?
<Googulator>oh...
<Googulator>near/far call mismatch
<Googulator>read_sectors_16 and write_sectors_16 end in CB instead of the correct C3
<Googulator>how on Earth did this work in live-bootstrap then?
<Googulator>fixing that, it now fails at the very end, when trying to write the result back
<Googulator>I of course never tested that feature
<Googulator>hmm, error 01h
<Googulator>invalid command
<Googulator>does SeaBIOS not implement LBA write?
<Googulator>ok, this makes sense
<Googulator>addr_packet is 1000010000A00000000000000000000001
<Googulator>crazy high LBA
<Googulator> https://github.com/coreboot/seabios/blob/master/src/disk.c#L171C26-L171C26
<Googulator>but why?
<rickmasters>Googulator: actually it looks like the LBA is zero in the addr_packet you posted. There is an extra byte on the end 01 ?
<Googulator>oh, right...
<Googulator>that 01 is actually the error code from the next write attempt
<Googulator>10 00 0100 00A0:0000 0000000000000000 is the actual address packet (broken up for readability)
<Googulator>& that looks correct
<Googulator>it's "write 1 sector from memory address 0000:A000 to sector 0"
<rickmasters>Googulator: if I comment out the line to check the carry flag, it works
<rickmasters>I mean, replace with two no-ops, 90 90
<GoogulatorMobile>"works"
<GoogulatorMobile>Carry flag is how int 13h indicates success vs failure
<rickmasters>right, but I was thinking it seems to have written ok but now that I think about it, its building itself so it probably just looks like it wrote ok
<GoogulatorMobile>Looking at the SeaBIOS source, there's no reason why CHS would work and LBA would fail
<GoogulatorMobile>SeaBIOS internally translates everything to LBA
<GoogulatorMobile>Even floppy access
<rickmasters>Googulator: I compiled a debug version of seabios and the LBA is indeed wrong inside the BIOS. It's correct for reads.
<rickmasters>dop.lba = 7b00bcd08ec08ed8
<rickmasters>Offhand it looks like you're setting disk packet to di but it should be si
<rickmasters>Googulator: mov di, addr_packet
<rickmasters>Googulator: both reads and writes use si: https://github.com/coreboot/seabios/blob/a6ed6b701f0a57db0569ab98b0661c12a6ec3ff8/src/disk.c#L166
<GoogulatorMobile>Wait, really?
<GoogulatorMobile>Reading the docs again - indeed, both use es:di
<GoogulatorMobile>Sorry, ds:si
<GoogulatorMobile>Also, 02h and 03h both use es:bx... Why do I remember reading the docs for int 13h and seeing ds:si for 02h and 42h, and es:di for 03h and 43h?
<GoogulatorMobile>...well, that explains it
<rickmasters>it seems to really work with mov si, addr_packet
<rickmasters>There is a push di before the int 13 and a couple pop di's that i think aren't necessary?
<rickmasters>... after changing to mov si, addr_packet
<rickmasters>Googulator: I think we got it figured out. I'm stepping away for an hour or so...
<stikonas>oriansj, GoogulatorMobile: I have a new theory why I have some problems in UEFI...
<stikonas>found this in the docs: A caller must always call with the stack 16-byte aligned. And at least with the first read call it seems to help
<stikonas>that is quite an annoying restriction...
<stikonas>I guess I just have to manually track stack alignment...
<mihi>fossy, stikonas: re y2038: I don't think we have that many time sources when bootstrapping (probably only the RTC), so the used 32-bit kernels could just be patched to subtract e.g. 40 years from current date when reading from RTC. I don't think anybody cares about timestamps of bootstrapped files :) A better reason for native 64-bit bootstrap is that maybe at some point in the future, 32-bit compatibility of
<mihi>current CPUs might go away...
<stikonas>mihi: indeed, that is another reason
<stikonas>we are already losing legacy BIOS mode...
<mihi>stikonas, that's a weird coincidence. Current gcc on Linux also requires 16-byte stack alignment, and some libs (e.g. SDL) segfault if you don't obey it in your custom assembly code calling into SDL :D
<stikonas>it's probably for performance reason