IRC channel logs

2024-01-25.log

back to list of logs

<Googulator>USB is orders of magnitude slower than SATA here with BIOS too
<Googulator>srcfs load time from a SATA SSD here is 5 minutes on a bad day, probably even faster
<Googulator>from USB drive, it's over 2 hours
<stikonas>yeah, but to get to hex1.efi you only need ot read a few KiB
<Googulator>CHS vs LBA made no difference in performance (using SATA)
<stikonas>though I guess that also includede time to rebuild itself and kaem
<Googulator>of course, LBA is mandatory for USB
<Googulator>at least if hard drive emulation is used
<Googulator>(with floppy emulation, it's the opposite, only CHS is supported - but we take steps to avoid floppy emulation)
<fossy>ok that is just taking far too long and idk if its hanging or what
<fossy>retrying on SSD
<fossy>seems to be going a bit faster at least?
<fossy>still probably about an order of magnitude slower than stage0-posix
<stikonas>perhaps the way stage0-uefi does I/O is not optimized in your UEFI implementation...
<stikonas>it's hard to debug these things though
<stikonas>basically impossible on real HW and somewhat annoying in qemu
<fossy>quite likely
<fossy>it is a lot faster on SSD
<fossy>ok, running the usb on my laptop now, it is WAYYY faster
<fossy>at least 100x faster on the usb than it was on the other PC
<fossy>deffo implementation reasons
<fossy>yep finished in 45 seconds LOL and the other PC is still on cc_amd64 i think
<fossy>unless it hung but i cant tell cause it's too slow
<fossy>will wait a couple hrs :P
<fossy>stikonas: actually, there is a problem; the building of M2-Planet.efi should have --little-endian in the hex2 invocation, that causes M2-Planet.efi to fail on my system
<stikonas>hmm, it doesn't have it?
<fossy>nope
<stikonas>strange...
<fossy>no idea how it worked on any system tbh
<fossy>oh
<fossy>hmmm
<fossy>ENDIAN_FLAG appears to be empty
<stikonas>fossy: 2nd rebuild of M2-Planet?
<stikonas>maybe something else went wrong
<stikonas>let me check
<fossy>yeah, something else did go wrong
<stikonas>sounds like environmental variables are not working...
<fossy>yeah, only ENDIAN_FLAG though, oddly
<stikonas>because it is set here https://git.stikonas.eu/andrius/stage0-uefi/src/branch/main/amd64/kaem.run
<stikonas>odd
<stikonas>so other stuff is there
<fossy>yep
<stikonas>e.g. ARCH
<stikonas>hmm
<stikonas>that is very stragen
<stikonas>strange
<stikonas>might be up to you to debug :D
<fossy>yeah i'll have a play around
<fossy>not much you can do if you cant repro
<stikonas>exactly :(
<stikonas>well, you can at least add print stuff in kaem
<stikonas>print more debug info
<fossy>yeah, thankfully its not a hex2 stage thats problematic or something, i would give up then
<stikonas>yeah, if it's hex2 stange and you are not on qemu
<stikonas>basically the only way to debug that I found was to run it in UEFI shell and rely on exit status
<Googulator>rickmasters: sent some juicy stage1 optimization PRs
<Googulator>we're back to 192 bytes of actual code+data space used
<Googulator>(excluding the boilerplate MBR & signature)
<lrvick>Anyone had issues getting binutils 2.41 to build deterministically? I get this in one out of 3 builds: https://dpaste.org/J7WeM
<fossy>never seen that, seems to be an object ordering thing
<fossy>what --jobs are you running at
<lrvick>20 on one machine and 30 something on another
<lrvick>the only linking flag that feels like it might impact this is separate-code vs --disable-separate-code
<lrvick>noticing this now, all the binutils builds without any issues have --disable-separate-code
<lrvick>which I don't have in this one
<lrvick>Makes me think the codepath for separate-code linking layout is not deterministic.
<lrvick>also it saves like 2 bytes or something so who cares
<lrvick>ACTION tries building with --disable-separate-code
<janneke>lrvick: it's been identical between the three builds i have -- http://paste.debian.net/1305245
<lrvick>Yeah I got the same hash on two different systems, but then a different hash on a third and that diff
<janneke>that's on guix core-updates branch, fwiw we're not using --disable-separate-code
<lrvick>interesting.
<lrvick>only 1/3 builds being different for me is making this heisenbug status. Will see if I can reproduce in another set of rounds with --disable-separate-code on the same set of machines.
<janneke>yeah, tricksy
<fossy>lrvick: hm, suspect it's a race (possibly when separate-code is enabled), i usually am only running with -j6 to -j10
<oriansj>fossy: the hex2 used to do the first build of M2-planet would be the one written in assembly and not the one in mescc-tools; so it would only support the host architecture.
<oriansj>and yes, caching of the reads/writes would have a significant performance difference in the early stage0 steps. (literally seconds vs hours depending on your firmware's ability to cache pages)
<muurkha>wow
<oriansj>microseconds latency vs nanoseconds latency is kinda a big delta and if they do a read from the USB flash for every byte and a write for every byte and wait for completion, then yes that is the sort of difference compared to just copying a byte into a block of RAM until a flush or close.
<muurkha>I didn't realize it was doing byte reads from the USB flash
<muurkha>presumably that involves, as a subroutine, doing a block read from the USB flash
<oriansj>and then clearing the block after the read and having to redo the block read again at the next read call
<oriansj>which again just reads another single byte from that block
<oriansj>as the stage0 steps leading up to M2libc are all single byte read/write from begining to end; (this is to reduce complexity in the pieces but it really kills performance on badly written firmware/kernels)
<muurkha>I feel like you could mostly close the nanosecond gap with if (requested_block == current_block) return;
<muurkha>well, I guess it's not quite that simple!
<muurkha>because then you also need
<muurkha>current_block = requested_block;
<muurkha>so it's like three or four machine instructions and a static variable
<muurkha>?
<deesix>A small talk about the "JustBuild project" https://fosdem.org/2024/schedule/event/fosdem-2024-2690-build-distribution-for-maintaining-the-famous-gcc-4-7/
<matrix_bridge><Andrius Štikonas> They start with tinycc, POSIX shell and coreutils
<matrix_bridge><Andrius Štikonas> That's not a small set though
<janneke>*and* a C library
<matrix_bridge><Andrius Štikonas> Generally by then bootstrap is fairly simple and resembles distro packaging
<matrix_bridge><Andrius Štikonas> Yeah they assume c library too
<matrix_bridge><Andrius Štikonas> C library is often one of the trickier bits
<matrix_bridge><Andrius Štikonas> As it does have some assembly
<oriansj>muurkha: well no, because there are writes between the reads to a different block, so you would need to support atleast 3 pages of disk being cached and it'll take a bit more logic to do even a last in last out cache.
<oriansj>4 pages to cover the case of new read page and new write page but what is an extra 4KB of cache when you have a 100+MB firmware blob involved.
<muurkha>oriansj: in some cases yes. Forth only guaranteed 2
<muurkha>but in lots of cases you're reading bytes to put them somewhere other than immediately another disk block
<oriansj>true buffering (caching) is rather handy for good performance and one doesn't need much memory to get 90% of the benefit
<muurkha>but it's true that you do need more than three machine instructions. maybe something like if (num == keys[0]) return 0; if (num == keys[1]) return 1; last ^= 1; keys[last] = num; /* proceed to block loading logic */
<muurkha>for two block buffers
<oriansj>muurkha: well doing 4 pages would be something like: and eax, 0xFFFFF000 ; mov ebx, 1; cmp eax, [$buf1]; je done; cmp eax, [$buf2]; je done; cmp eax, [$buf3]; je done; cmp eax, [$buf4]; je done; (boring page load logic and ); :done mov eax, [ebx+$buf1]; return;
<oriansj>not very complicated or slow but will be slightly less likely to hit a page fault than a 2 buffer cache