IRC channel logs

2025-05-11.log

back to list of logs

<matrix_bridge><Andrius Štikonas> gtker: hmm, interesting... Not sure I understand why it doesn't like allocated but not touched stack though...
<matrix_bridge><Andrius Štikonas> in UEFI the original stack is fairly small and not sufficient to run M2-Planet, so actually all programs on startup allocated a bigger block of memory (I think 8 MiB) and mov rsp pointer to there
<matrix_bridge><Andrius Štikonas> so all that memory should in principle already belong to the application
<matrix_bridge><Andrius Štikonas> (this is the code that prepares new stack area: https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/amd64/uefi/libc-full.M1#L79)
<matrix_bridge><Andrius Štikonas> ok, the latest versions of submodules + your patch seem to fix everything
<matrix_bridge><Andrius Štikonas> so luckily that was the only bug so far
<matrix_bridge><Andrius Štikonas> though I still don't understand why it was happening...
<matrix_bridge><Andrius Štikonas> all the calls to UEFI functions (and hence printing too) should realign stack before passing control to UEFI
<matrix_bridge><Andrius Štikonas> and the spec (https://uefi.org/specs/UEFI/2.9_A/02_Overview.html) only says: The stack must be 16-byte aligned. Stack may be marked as non-executable in identity mapped page tables.
<matrix_bridge><Andrius Štikonas> so needed 32-byte alignment is strange
<agg1>isn't 32byte alignment less strict than 64byte? seems not plausible, because if something is 64byte aligned it's too 32byte aligned
<matrix_bridge><Andrius Štikonas> hmm, yes
<matrix_bridge><Andrius Štikonas> still, the stack should be alligned by uefi call wrappers, e.g. here https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/uefi/uefi.c#L316
<matrix_bridge><Andrius Štikonas> push rsp; push [rsp]; and_rsp, %-16 should align the stack to 16-bytes already
<agg1>it's probably not an alignment issues as such, but uefi expecting routines placed at a specific address (don't know what the call-flow of this is)
<agg1>then it could be a somewhat lucky hit or miss, sometimes it's working with this alignment, some other time it's another one
<matrix_bridge><gtker> stikonas: I really don't understand it either. I could also make it output things semi-correcly by deliberately adding additional stack space. The effects weren't really consistent and it did things that I would consider extremely weird like printing a space (or other invisible byte) before every character. I'm not sure if UEFI expects UTF-16 or something like that or what else could go wrong.
<matrix_bridge><gtker> I'm also not 100% fully convinced that it's entirely because we need to touch the stack, it might also have something to do with the values written to the stack in an unpredictable way
<matrix_bridge><gtker> The patch effectively does the same thing as pushing REGISTER_ZERO to the stack to allocate more stack space so I'm thinking that there might be an uninitialized variable or something that gets a good-enough value with the patch
<matrix_bridge><Andrius Štikonas> anyway, I don't have time to look at it today or even tomorrow...
<matrix_bridge><Andrius Štikonas> maybe later in the week
<matrix_bridge><Andrius Štikonas> generally UEFI is much pickier about things than Linux kernel
<matrix_bridge><gtker> Alright. If we're adding my patch then we might want to be able to detect when we're compiling for UEFI so that we don't have the touch the stack for all architectures unnecessarily
<matrix_bridge><Andrius Štikonas> well, we could spend some more time to try to understand
<matrix_bridge><Andrius Štikonas> there isn't really any rush to push it
<matrix_bridge><Andrius Štikonas> up to you...
<matrix_bridge><Andrius Štikonas> though it's not ideal to push a workaround without understanding
<matrix_bridge><Andrius Štikonas> I might connect gdb to it a bit later next week
<matrix_bridge><gtker> Yeah, although we do want UEFI to have parity with the others in regards to enums/CONSTANTs, so we might need to push it
<matrix_bridge><Andrius Štikonas> well, yeah, though short term non-parity is ok
<matrix_bridge><gtker> stikonas: I tried writing different immediate values to the stack instead of just using whatever was in register zero. Couldn't find a value that made it not work, so I'm leaning on it being just needing to touch the stack. How does paging work in UEFI? Do we get a catchable interrupt if we try to read/write wrong memory?
<matrix_bridge><cosinusoidally> I'm not too familiar with uefi, but I wonder if stack allocation is similar to windows where it grows the stack using a guard page. On win32 if you have more than a page worth of local variables you need to emit code to touch each stack page in turn to grow the stack. If you don't do that then the app may crash. https://devblogs.microsoft.com/oldnewthing/20220203-00/?p=106215 explains it in diagrams
<matrix_bridge><cosinusoidally> The scenario I have hit is when I essentially did something like foo(){ int bar[2000]; bar[0]=100;} that caused a write below the stack page and a crash. This only happened because I was trying to run code generated by the Linux tcc backed on win32. In win32 mode tcc would insert stack probes that mitigated the issue.
<matrix_bridge><gtker> cosinusoidally: That's what I think is happening. I don't really remember, but isn't UEFI very Windows/Microsoft inspired?
<matrix_bridge><gtker> It's also low level enough that there could realistically be an interrupt that we just aren't catching when it happens
<matrix_bridge><cosinusoidally> Interesting https://wiki.osdev.org/GNU-EFI also shows "-fshort-wchar" which does sound quite windows centric. As I mention I don't know much about uefi, but the bits I've seen seem weirdly windows specific (like using PE files).
<mihi>stikonas, gtker: One difference of <https://github.com/oriansj/M2-Planet/commit/413c69f4a58ff3e294572549875195df9ccfb5a8> is that now, when short values are pushed to the stack, the upper bits are not cleared (as push does). I double checked calling convention of UEFI, and no, it was not that simple (they do *not* need to be zeroed). On the other hand, I think the signature of https://uefi.org/specs/UEFI/2.9_A/12_
<mihi>Protocols_Console_Support.html#efi-simple-text-output-protocol-outputstring is the issue - it takes a pointer to a CHAR16 null-terminated string (i.e. terminated by two null bytes), but you pass it a pointer to a single CHAR8. So unless the next three bytes are zero, it will cause unexpected behaviour.
<mihi>and proably depending on the actual value on the (uninitialized) stack, it manifests differently depending on how you touch the stack :)
<mihi>Link again in clickable form: https://uefi.org/specs/UEFI/2.9_A/12_Protocols_Console_Support.html#efi-simple-text-output-protocol-outputstring
<matrix_bridge><gtker> mihi: I think you might be right. From looking at https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/uefi/unistd.c#L222 write for UEFI not only do we pass it a "char", but we also pass it a pointer to the stack that only contains a single "char"?
<matrix_bridge><gtker> Shouldn't be pass it "buf + i" instead?
<mihi>a pointer to buf+i will never work if there are any following chars. Probably the correct fix would be to allocate a 4-byte (or two-16bit-word) array and fill the first one only.
<matrix_bridge><gtker> Ah right, we're looping over the count. I assumed that we could just write the entire string at once like a normal write. I _believe_ that currently the "char c" is padded to 8 bytes with zeroes which it why it currently works
<mihi>you could use a single write in case you converted it to UTF-16 first :)
<matrix_bridge><gtker> Don't think we'll do that, it's probably easier to just print one char at a time 😄
<matrix_bridge><gtker> The "__uefi_3" below where we supply the count as a pointer and the buffer as chars would also be wrong, right?
<matrix_bridge><gtker> Right now I'm wondering how this even worked at all, unless the normal write just accepts a char buffer?
<mihi>file write accepts just raw bytes, which are stored to the file, not interpreted in any encoding. As long as whatever tool reading the file can cope with ASCII/latin1 encoded files, it should be fine.
<mihi>and I assume the tools reading the files will be M1/hex, which are designed to work with 8-byte characters anyway.
<matrix_bridge><gtker> That's what I'm thinking, but I can't find the filo IO part of the UEFI docs
<matrix_bridge><gtker> Does UEFI have a concept of a file system outside of the actual storage devices?
<mihi> https://uefi.org/specs/UEFI/2.10/13_Protocols_Media_Access.html#file-protocol
<mihi>the concept is called EFI_SIMPLE_FILE_SYSTEM_PROTOCOL and it supports FAT32 only by default
<matrix_bridge><gtker> Nice, thanks
<mihi>(in case you wonder why my links point to different versions of the UEFI specification - I just used the first result when googling for EFI_SIMPLE_FILE_SYSTEM_PROTOCOL and then clicked through to the write function)
<matrix_bridge><gtker> mihi: Do you know what the endianness of the UTF-16 is? Never really done development on Windows but I'm guessing it's just LE?
<mihi>gtker, yes it is UTF-16-LE
<mihi>otherwise the old code would have produced mojibake anyway :)
<mihi>(just like some old versions of CDex cd ripping software did in their Unicode ID3 tags)
<matrix_bridge><gtker> mihi: Just to make sure I'm not stupid: A single char followed by a nullterminator would be char, 0, 0, 0?
<mihi>I can confirm that you are not stupid.
<matrix_bridge><gtker> That's good 😄
<matrix_bridge><Andrius Štikonas> gtker: UEFI boots with identity map between physical and virtual memory
<matrix_bridge><Andrius Štikonas> And at the moment we are not touching it
<matrix_bridge><Andrius Štikonas> It is allowed to set up different paging map as long as you restore identity map before calling UEFI functions
<matrix_bridge><Andrius Štikonas> Anyway, I can't look much more right now as I'm taking flight home soon
<matrix_bridge><gtker> stikonas: No worries. I'm trying out a few things. Haven't gotten anything working other than my previous fix
<matrix_bridge><Andrius Štikonas> But yes, it might be bug there...
<matrix_bridge><gtker> stikonas: Do you know which write functions are definitely used for UEFI? I've tried just inserting "exit(1)" into the ones I suspect were used but it doesn't do anything
<matrix_bridge><Andrius Štikonas> gdbing that write function might make sense...
<matrix_bridge><Andrius Štikonas> int write should be used for both stdout and file output
<matrix_bridge><gtker> https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/uefi/unistd.c#L222 in M2libc should be used in M2-Mesoplanet, and bootstrap.c in M2-Planet, right?
<matrix_bridge><Andrius Štikonas> bootstrap.c is used in M2-Planet --boostrap-mode only
<matrix_bridge><Andrius Štikonas> So only until we have M1
<matrix_bridge><gtker> I meant for building M2-Planet, sorry
<matrix_bridge><Andrius Štikonas> Full M2-Planet.efi uses https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/uefi/unistd.c#L222
<matrix_bridge><Andrius Štikonas> But M2.efi uses smaller bootstrap.c
<matrix_bridge><gtker> Weird. If I insert "exit(1)" calls into those functions it just keeps on working...
<matrix_bridge><Andrius Štikonas> Strange...
<matrix_bridge><Andrius Štikonas> And you restarted qemu?
<matrix_bridge><gtker> Yeah, I run a new "make qemu" for every change
<matrix_bridge><Andrius Štikonas> Hmm
<matrix_bridge><Andrius Štikonas> That should have worked
<matrix_bridge><gtker> Even if I put an "exit(1)" inside "_write_stdout" it still keeps working...
<matrix_bridge><gtker> Tried with an exit inside main of M2-Planet and it exited correctly...
<matrix_bridge><Andrius Štikonas> How about _exit(1)?
<matrix_bridge><gtker> Still works
<matrix_bridge><gtker> Are the binaries cached somewhere?
<matrix_bridge><Andrius Štikonas> No, shouldn't be
<matrix_bridge><Andrius Štikonas> exit(1) still calls cleanup function
<matrix_bridge><Andrius Štikonas> https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/amd64/uefi/libc-full.M1#L49
<matrix_bridge><Andrius Štikonas> Which calls kill_io
<matrix_bridge><Andrius Štikonas> So might still flush buffers
<matrix_bridge><Andrius Štikonas> Try __exit(1) then
<matrix_bridge><gtker> But then it shouldn't be able to keep working and run M2-Mesoplanet and so on?
<matrix_bridge><Andrius Štikonas> Probably not
<matrix_bridge><Andrius Štikonas> Hmm
<matrix_bridge><Andrius Štikonas> Not 100% sure
<matrix_bridge><Andrius Štikonas> There is a chance that it would work but would leak some resources
<matrix_bridge><gtker> Trying now to just call a function called "sdafsdgasdg(1)" which also just works. Maybe it's not finding the exit function?
<matrix_bridge><Andrius Štikonas> Hmm, strange though
<matrix_bridge><Andrius Štikonas> I'm sure I used exit() and __exit()
<matrix_bridge><gtker> Is M2libc cached somehow?
<matrix_bridge><Andrius Štikonas> For debugging things
<matrix_bridge><Andrius Štikonas> M2libc isn't but maybe you are editing the wrong copy?
<matrix_bridge><Andrius Štikonas> Top level M2libc is used
<matrix_bridge><Andrius Štikonas> Not the one in M2-Planet/M2libc
<matrix_bridge><gtker> Ah, for all of submodules?
<matrix_bridge><Andrius Štikonas> Yes
<matrix_bridge><Andrius Štikonas> Always top level
<matrix_bridge><gtker> That's probably why then 😄
<matrix_bridge><Andrius Štikonas> Same in stage0-posix
<matrix_bridge><Andrius Štikonas> OK, mystery solved
<matrix_bridge><gtker> Yup, that was why, thanks 😄
<matrix_bridge><gtker> Will do more testing, but it seems like changing the "char c = 0" in https://github.com/oriansj/M2libc/blob/0247ef9b18945d29f20433a15c7b6a4729e07673/uefi/unistd.c#L226 to "int c = 0" fixes this...
<matrix_bridge><gtker> Thanks mihi 😄
<matrix_bridge><Andrius Štikonas> I guess it used to be zeroed by accident?
<matrix_bridge><Andrius Štikonas> And yes, thanks to mihi
<matrix_bridge><gtker> I believe it's because (as mihi said) that "push" causes the entire register to be zeroed, but "store_value" depends on the size of the type
<matrix_bridge><Andrius Štikonas> Yeah, OK, makes sense
<matrix_bridge><Andrius Štikonas> (I was just quickly skimming through earlier conversation on my phone)
<matrix_bridge><gtker> It makes sense that changing the values loaded into uninitialized variables didn't affect it since the variable was initialized
<matrix_bridge><gtker> I think what made my fix work was changing "store_value(type_size->size)" to "store_value(register_size)"
<matrix_bridge><Andrius Štikonas> Yeah, but that was a hack
<matrix_bridge><Andrius Štikonas> Now we understand this and have a proper fix
<matrix_bridge><Andrius Štikonas> Anyway good find both of you :)
<matrix_bridge><gtker> It was a hack, but it effectively does the same thing as changing the variable to an "int" since it zeros out the entire memory. Not sure why adding more to the stack space made it work. Maybe the memory is just naturally zeroed out and we were lucky that we didn't overwrite the zeroes in exactly the right places?
<matrix_bridge><Andrius Štikonas> Probably int16_t would work too
<matrix_bridge><gtker> I think we'll need at least a 4 byte size since the first 2 bytes are the char and the other 2 bytes are the null terminator
<matrix_bridge><Andrius Štikonas> Oh indeed
<matrix_bridge><Andrius Štikonas> So int32 then
<mihi>gtker: regardless what variable size you choose, I'd suggest to add a comment so that the next person debugging it knows what's happening there :)
<matrix_bridge><gtker> mihi: Definitely, and thanks for the help 🙂
<mihi>you are very welcome :)
<mihi>[<Andrius Štikonas> It is allowed to set up different paging map as long as you restore identity map before calling UEFI functions] : Any source for that (does not need to be now)?
<mihi>as far as I know, as long as you are in boot services mode, you may not touch any processor registers (like enable paging or more processor cores or changing interrupts). Once you swiched to Runtime Services mode, you either need to call SetVitualAddressMap right after ExitBootServices to declare your desired mapping, or you always need to switch back to identity mapping before calling any runtime function.
<mihi>but being able to enable paging in boot services mode would definitely make life easier for us (otherwise we'd have to cope without disk IO, keyboard, text output until we load our own drivers)
<mihi>So query EFI_GRAPHICS_OUTPUT_PROTOCOL to enable linear framebuffer mode, and also query the memory map and manage memory ourselves. And especially important, after exiting boot services you cannot return to EFI shell any longer..
<matrix_bridge><Andrius Štikonas> @irc_libera_mihi:stikonas.eu: https://uefi.org/specs/UEFI/2.10/02_Overview.html#enabling-paging-or-alternate-translations-in-an-application
<mihi>ah ok, makes sense. When you disable or wrap interrupts, the boot services cannot notice it anyway :)