IRC channel logs

2024-01-13.log

back to list of logs

<fossy>GoogulatorMobile: seems like my theory about /dev/shm was correct given that "run up to Python bootstrap" stage of CI completed
<stikonas>hmm, looks like random hangs on qemu are not related to forking at all... It's probably a race condition between some UEFI timer interrupt kicking in after syscall and me fixing the value of SS register with "mov_ss,eax"
<fossy>these UEFI timer interrupts seem to be causing so much undesirable behaviour :\
<stikonas>yeah, though I don't see it on baremetal
<stikonas>either they are not happening there
<stikonas>(on my system)
<stikonas>or race condition is much rare
<stikonas>the thing is that UEFI itself sets up CS and SS registers as if it was in ring 3
<stikonas>(or something like that)
<stikonas>and syscall instruction does a lot of things (https://www.felixcloutier.com/x86/syscall) but it changes tha values of CS and SS in a way that ends up with a different values than what UEFI needs
<stikonas>so I had this hack to fix it up... https://git.stikonas.eu/andrius/stage0-uefi/src/commit/b61add87cbb2ee193f82b1db214a03a5b2e09e76/posix-runner/posix-runner.c#L421
<fossy>eek, that does look a little bit hacky
<fossy>what's the significance of 0x30? i am almost completely unfaimiliar with memory segmentation (thats what cs & ss are used for right)
<stikonas>well, I think these values specifies where in the memory stack is
<stikonas>but anyway, UEFI starts with cs = 0x38 and ss = 0x30
<stikonas>and for it to function, I think it needs these values
<fossy>uh huh
<stikonas>then syscall instruction sets them to
<stikonas>CS.Selector := IA32_STAR[47:32] AND FFFCH
<stikonas>and SS.Selector := IA32_STAR[47:32] + 8;
<stikonas>so whatever I write to IA32_STAR register, I'll still have SS = CS + 8
<fossy>yeah
<stikonas>(and I need vice versa...)
<stikonas>I'm trying to disable interrupts...
<stikonas>but that is not helping... :(
<stikonas>at least call to https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.html?highlight=raise_tpl#efi-boot-services-raisetpl didn't work...
<fossy>how do interrupts work in uefi?
<fossy>normally?
<stikonas>I think it only has timer interrupt...
<stikonas>and you can also create your own events...
<stikonas> https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.html?highlight=raise_tpl#efi-boot-services-createevent
<stikonas>but I don't really know much about that
<stikonas>oh, we are back to single digits in live-bootstrap pull requets...
<stikonas>fossy: as for significance of 0x30, there is significance of the last digit (0)
<stikonas>that signifies that we are in ring 0
<stikonas>maybe I need to somehow steal this from UEFI... https://en.wikipedia.org/wiki/Interrupt_descriptor_table
<stikonas>no idea how any of that works...
<fossy>i'm not particularly familiar with it either, apart from eharing that it exists
<fossy>the IDT appears to exist in a location in memory, but Wikipedia doesn't detail the structure of the IDT in memory...
<stikonas>there is also this https://uefi.org/specs/UEFI/2.10/02_Overview.html#enabling-paging-or-alternate-translations-in-an-application
<stikonas>I think it implies that it should be possible
<stikonas>anyway, it's getting late today...
<oriansj>well I found a segfault in mescc-tools-extra's match.c but it can only be triggerred by badly behaving shells
<stikonas>oriansj: can you add those defines to M2libc https://github.com/oriansj/M2libc/pull/54 ?
<stikonas>I think I've figured out how to fix those random hangs on UEFI
<stikonas>(i.e. disable/restore HW interrupts when necessary)
<stikonas>and nice work finding segfault!
<oriansj>stikonas: nice
<stikonas>so my next bug in posix-runner is something to do with free()
<stikonas>at least sometimes kaem exists with exit code 1
<oriansj>and merged
<stikonas>I debugged this to happen due to fclose() at the end of the program
<stikonas>which does free(stream->buffer)
<stikonas>which for some reason exits with no chunk allocated at that address error
<stikonas>(might be something to do with bugs in fork code)
<stikonas>oriansj: thanks!
<stikonas>interrupt fix pushed too: https://git.stikonas.eu/andrius/stage0-uefi/commit/5f7ebfd46c108aa5d82366c0756193fb1bc5d522
<oriansj>great work on the UEFI kernel; it is shaping up nicely
<stikonas>so now I can run mescc-tools-mini-kaem.kaem to completion
<stikonas>though I think it exits with 1 due to the bug above
<stikonas>and AMD64/run.kaem also runs till the first kaem script finishes just before building mescc-tools-extra
<stikonas>yeah, it's almost able to run stage0-posix now
<stikonas>well, now kaem-optional-seed as that is not position independent
<stikonas>(making it PIE would add 3 extra bytes I think)
<oriansj>which is fine if it helps
<stikonas>well, we don't scritly need it for UEFI bootstrapping
<oriansj>would adding --mode to mkdir in mescc-tools-extra be helpful?
<stikonas>hmm, I don't think we need it
<stikonas>but it's not harmful either
<stikonas>oh there are quite a few attemts at hobbyist kernels (though not in bootstrappable context), e.g. just saw this one https://github.com/Kaj9296/PatchworkOS/tree/main
<oriansj>well yeah, like writing a compiler or an interpreter. it is a journey of learning about reality
<oriansj>and bootstrapping is always a group activity in the long term
<oriansj>because there is always too much to do in a FULL bootstrap and no single person can do it all but collectively and with years of fun together; we achieved far more together than anyone could do alone.
<lrvick>Okay, back from way too much travel, and back to trying to get live-bootstrap to seed an OCI-based linux distro.
<lrvick>Latest failing attempt, it makes it to weirdly, bc: https://dpaste.org/XyUdg/raw
<lrvick>Not much to go on here. Ideas for debugging welcome
<lrvick>I stripped my docker setup to the bone so can just drop this example in root of master to test
<stikonas>hmm, that looks like the end of former sysa
<stikonas>i.e. just before Linux kernel (which is skipped in non qemu/baremetal modes)
<stikonas>lrvick: a bit hard to tell from just your log what is happening...
<lrvick>I used the following small patch to rootfs.py to generate the target directory with "--container": https://dpaste.org/aQpkS/raw
<stikonas>looks like kaem-optional-seed is not happy with exit code of its child program
<stikonas>which I guess is full kaem
<stikonas>it migth be useful to strace it...
<lrvick>The only thing I skip from "container" vs "chroot" is the actual call to chroot.
<lrvick>which does nothing special afaict
<lrvick>but yeah, Suppose I will try an strace pass
<stikonas>in particular see which program first spits non-zero exit code
<stikonas>hopefully it's not kaem-optional itself but some child
<stikonas>(kaem-optional is written in hex0, so annoying to edit)
<stikonas>if it's something higher level, then one can start inserting more debug print statements, etc..
<lrvick>There is a matrix bridge to this channel?
<lrvick>ACTION looks
<stikonas>lrvick: I run it
<stikonas>though right now I have some router issues (it's rebooting frequently, I probably need to buy a new one), which is why I (and matrix bridge) ocassionally lose connection for a short time
<lrvick>What is the room?
<stikonas>#bootstrappable:stikonas.eu or #bootstrappable:kde.org
<matrix_bridge>ACTION Lance R. Vick waves
<matrix_bridge><Lance R. Vick> test
<matrix_bridge><Lance R. Vick> bridge does not seem happy yet
<matrix_bridge><Lance R. Vick> oh there it goes. Just delayed.
<matrix_bridge><Andrius Štikonas> shoudn't be too delayed now, it was just delayed when bridge was reconnecting after router reboot
<stikonas>yeah, it arrived fast now
<matrix_bridge><Lance R. Vick> Makes sense
<lrvick>Pushed my --docker branch to: https://github.com/lrvick/live-bootstrap
<lrvick>doing a from scratch test again, then if it also fails I will try to somehow inject strace
<lrvick>Probably can build a static strace binary and copy it in to the scratch layer and run it in front of kaem
<stikonas>lrvick: you don't need to inject it into the bootstrap
<stikonas>you can try to run the whole thing with strace...
<stikonas>though perhaps worth filtering list of syscalls
<stikonas>well, I guess exit, execve to start with...
<lrvick>Oh, you mean run strace in front of docker with follow forks. I remember you can do that now from our last convo a few weeks ago.
<stikonas>yaeh, you can do that
<stikonas>anyway, I suggest first finding which process misbehaves
<stikonas>and then debug it with whatever tools we have...
<stikonas>(strace, gdb or edit program and print more debug info)
<fossy>lrvick: based on your logs, i'd say there's a chance that some command from helpers.sh is silently failing; which would cause the entire bootstrap to exist without printing a message. you could change line 459 of seed/script-generator.c from set -e to set -ex, that would cause all scripts to print out commands as they are run which might show you what is failing without the junk of strace
<matrix_bridge><Lance R. Vick> That makes sense. Giving it a shot now.
<fossy>stikonas: your comment about the checksum prepending being a pain to maintain is very fair. but now that we have script-generator i think i will be able to work around that, actually -- i should have a reasonable solution in a few hours
<fossy>stikonas: also, https://github.com/fosslinux/live-bootstrap/issues/411, i think we talked about this before and the answer it is correct as is, but can you confirm
<stikonas>fossy: no, that's correct
<stikonas>ok, I've closed that issue now and left a short comment\
<fossy>thank you :D
<stikonas>fossy: not sure if you saw the logs, but I've sorted out those UEFI interrupt issues
<stikonas>my first attempt didn't work but later I figured it out, the trick was to restore back interrupts while syscalls are being processed by UEFI
<stikonas>(the whole change is fairly small: https://git.stikonas.eu/andrius/stage0-uefi/commit/5f7ebfd46c108aa5d82366c0756193fb1bc5d522)
<stikonas>far simpler than dealing with that interrupt descriptor table
<stikonas>fossy: didn't we have an option to rename files on download?
<stikonas>(last argument of sources)
<stikonas>I'm just trying to understand the logic behind those hashes...
<fossy>stikonas: we do, i added this because i thought that compared to the arbitary renaming of files, this is a way to ensure that a particular distfile is always identified by something that doesn't change & isn't arbitarily chosen like a filename. allows multiple files for which it makes sense for them to have the same filename to coexist too
<fossy>i'm not super attached to it, i'm happy to remove it if you think it's not going to provide sufficient benefit
<stikonas>well, let's see what others think, if others thing it's a good idea then I'm fine
<stikonas>in particular, let's see what Googulator says, he has been doing a lot of work too recently
<fossy>sure