IRC channel logs

<Googulator>USB is orders of magnitude slower than SATA here with BIOS too

<Googulator>srcfs load time from a SATA SSD here is 5 minutes on a bad day, probably even faster

<Googulator>from USB drive, it's over 2 hours

<stikonas>yeah, but to get to hex1.efi you only need ot read a few KiB

<Googulator>CHS vs LBA made no difference in performance (using SATA)

<stikonas>though I guess that also includede time to rebuild itself and kaem

<Googulator>of course, LBA is mandatory for USB

<Googulator>at least if hard drive emulation is used

<Googulator>(with floppy emulation, it's the opposite, only CHS is supported - but we take steps to avoid floppy emulation)

<fossy>ok that is just taking far too long and idk if its hanging or what

<fossy>retrying on SSD

<fossy>seems to be going a bit faster at least?

<fossy>still probably about an order of magnitude slower than stage0-posix

<stikonas>perhaps the way stage0-uefi does I/O is not optimized in your UEFI implementation...

<stikonas>it's hard to debug these things though

<stikonas>basically impossible on real HW and somewhat annoying in qemu

<fossy>quite likely

<fossy>it is a lot faster on SSD

<fossy>ok, running the usb on my laptop now, it is WAYYY faster

<fossy>at least 100x faster on the usb than it was on the other PC

<fossy>deffo implementation reasons

<fossy>yep finished in 45 seconds LOL and the other PC is still on cc_amd64 i think

<fossy>unless it hung but i cant tell cause it's too slow

<fossy>will wait a couple hrs :P

<fossy>stikonas: actually, there is a problem; the building of M2-Planet.efi should have --little-endian in the hex2 invocation, that causes M2-Planet.efi to fail on my system

<stikonas>hmm, it doesn't have it?

<fossy>nope

<stikonas>strange...

<fossy>no idea how it worked on any system tbh

<fossy>oh

<fossy>hmmm

<fossy>ENDIAN_FLAG appears to be empty

<stikonas>fossy: 2nd rebuild of M2-Planet?

<stikonas>maybe something else went wrong

<stikonas>let me check

<fossy>yeah, something else did go wrong

<stikonas>sounds like environmental variables are not working...

<fossy>yeah, only ENDIAN_FLAG though, oddly

<stikonas>because it is set here https://git.stikonas.eu/andrius/stage0-uefi/src/branch/main/amd64/kaem.run

<stikonas>odd

<stikonas>so other stuff is there

<fossy>yep

<stikonas>e.g. ARCH

<stikonas>hmm

<stikonas>that is very stragen

<stikonas>strange

<stikonas>might be up to you to debug :D

<fossy>yeah i'll have a play around

<fossy>not much you can do if you cant repro

<stikonas>exactly :(

<stikonas>well, you can at least add print stuff in kaem

<stikonas>print more debug info

<fossy>yeah, thankfully its not a hex2 stage thats problematic or something, i would give up then

<stikonas>yeah, if it's hex2 stange and you are not on qemu

<stikonas>basically the only way to debug that I found was to run it in UEFI shell and rely on exit status

<Googulator>rickmasters: sent some juicy stage1 optimization PRs

<Googulator>we're back to 192 bytes of actual code+data space used

<Googulator>(excluding the boilerplate MBR & signature)

<lrvick>Anyone had issues getting binutils 2.41 to build deterministically? I get this in one out of 3 builds: https://dpaste.org/J7WeM

<fossy>never seen that, seems to be an object ordering thing

<fossy>what --jobs are you running at

<lrvick>20 on one machine and 30 something on another

<lrvick>the only linking flag that feels like it might impact this is separate-code vs --disable-separate-code

<lrvick>noticing this now, all the binutils builds without any issues have --disable-separate-code

<lrvick>which I don't have in this one

<lrvick>Makes me think the codepath for separate-code linking layout is not deterministic.

<lrvick>also it saves like 2 bytes or something so who cares

<lrvick>ACTION tries building with --disable-separate-code

<janneke>lrvick: it's been identical between the three builds i have -- http://paste.debian.net/1305245

<lrvick>Yeah I got the same hash on two different systems, but then a different hash on a third and that diff

<janneke>that's on guix core-updates branch, fwiw we're not using --disable-separate-code

<lrvick>interesting.

<lrvick>only 1/3 builds being different for me is making this heisenbug status. Will see if I can reproduce in another set of rounds with --disable-separate-code on the same set of machines.

<janneke>yeah, tricksy

<fossy>lrvick: hm, suspect it's a race (possibly when separate-code is enabled), i usually am only running with -j6 to -j10

<oriansj>fossy: the hex2 used to do the first build of M2-planet would be the one written in assembly and not the one in mescc-tools; so it would only support the host architecture.

<oriansj>and yes, caching of the reads/writes would have a significant performance difference in the early stage0 steps. (literally seconds vs hours depending on your firmware's ability to cache pages)

<muurkha>wow

<oriansj>microseconds latency vs nanoseconds latency is kinda a big delta and if they do a read from the USB flash for every byte and a write for every byte and wait for completion, then yes that is the sort of difference compared to just copying a byte into a block of RAM until a flush or close.

<muurkha>I didn't realize it was doing byte reads from the USB flash

<muurkha>presumably that involves, as a subroutine, doing a block read from the USB flash

<oriansj>and then clearing the block after the read and having to redo the block read again at the next read call

<oriansj>which again just reads another single byte from that block

<oriansj>as the stage0 steps leading up to M2libc are all single byte read/write from begining to end; (this is to reduce complexity in the pieces but it really kills performance on badly written firmware/kernels)

<muurkha>I feel like you could mostly close the nanosecond gap with if (requested_block == current_block) return;

<muurkha>well, I guess it's not quite that simple!

<muurkha>because then you also need

<muurkha>current_block = requested_block;

<muurkha>so it's like three or four machine instructions and a static variable

<muurkha>?

<deesix>A small talk about the "JustBuild project" https://fosdem.org/2024/schedule/event/fosdem-2024-2690-build-distribution-for-maintaining-the-famous-gcc-4-7/

<matrix_bridge><Andrius Štikonas> They start with tinycc, POSIX shell and coreutils

<matrix_bridge><Andrius Štikonas> That's not a small set though

<janneke>*and* a C library

<matrix_bridge><Andrius Štikonas> Generally by then bootstrap is fairly simple and resembles distro packaging

<matrix_bridge><Andrius Štikonas> Yeah they assume c library too

<matrix_bridge><Andrius Štikonas> C library is often one of the trickier bits

<matrix_bridge><Andrius Štikonas> As it does have some assembly

<oriansj>muurkha: well no, because there are writes between the reads to a different block, so you would need to support atleast 3 pages of disk being cached and it'll take a bit more logic to do even a last in last out cache.

<oriansj>4 pages to cover the case of new read page and new write page but what is an extra 4KB of cache when you have a 100+MB firmware blob involved.

<muurkha>oriansj: in some cases yes. Forth only guaranteed 2

<muurkha>but in lots of cases you're reading bytes to put them somewhere other than immediately another disk block

<oriansj>true buffering (caching) is rather handy for good performance and one doesn't need much memory to get 90% of the benefit

<muurkha>but it's true that you do need more than three machine instructions. maybe something like if (num == keys[0]) return 0; if (num == keys[1]) return 1; last ^= 1; keys[last] = num; /* proceed to block loading logic */

<muurkha>for two block buffers

<oriansj>muurkha: well doing 4 pages would be something like: and eax, 0xFFFFF000 ; mov ebx, 1; cmp eax, [$buf1]; je done; cmp eax, [$buf2]; je done; cmp eax, [$buf3]; je done; cmp eax, [$buf4]; je done; (boring page load logic and ); :done mov eax, [ebx+$buf1]; return;

<oriansj>not very complicated or slow but will be slightly less likely to hit a page fault than a 2 buffer cache

IRC channel logs

2024-01-25.log