IRC channel logs

2024-06-11.log

back to list of logs

<stikonas>fossy: were you able to reproduce qemu issue?
<stikonas>I haven't yet captured what causes it, but probably what happens is at some point something causes reboot
<stikonas>so instead of kexec, it just restarts into initial image
<stikonas>ok, it looks like Fiwix at least starts booting
<stikonas>I haven't been able to capture exact error, reboot happened too fast
<stikonas>hmm, I should run it in non-interactive mode...
<Googulator>Sounds like yet another iteration of the annoying Fiwix freeze/crash bug that affects Hyper-V and some newer(?) qemu versions
<Googulator>do some random changes to the code, e.g. debug prints to try and figure it out, and it just goes away
<stikonas>well, I've now started it without --interactive
<stikonas>and tee'ed output to log file
<stikonas>though I might finish tomorrow
<Googulator>for a while I thought it was related to newer generation Intel CPUs, but has since been reproduced on AMD and older Intel systems
<stikonas>in principle fiwix shouldn't even be affected by meslibc...
<stikonas>it's kernel
<Googulator>did the hash change?
<stikonas>I think so
<Googulator>my theory is that it's some weird memory management bug that's extremely sensitive to the exact memory layout of each kernel's code
<stikonas>hmm, but I also upgraded tinycc...
<stikonas>that's one thing I can try later
<stikonas>if logs don't show anything
<stikonas>just undo tinycc upgrade
<stikonas>but that's only used to bootstrap tcc 0.9.27
<Googulator>In that case, it shouldn't change the checksum of the final tcc
<Googulator>if it does, then either the old one or the new one is miscompiling in such a way that it propagates across subsequent builds
<stikonas>well, final tcc changes to to meslibc upgrade
<stikonas>hmm
<stikonas>well, there are things to try
<Googulator>but if you revert only the tccboot change, it should leave the final tcc unchanged from the one with everything changed
<stikonas>well, libc also changed
<stikonas>so final tcc binary will be different too
<stikonas>(due to static linking of libc)
<stikonas>anywya, I still have some things to try
<stikonas>just annoying to have an issue where I didn't expect any...
<Googulator>I mean, mes 0.26.1 plus old tccboot should produce the same tcc checksum as mes 0.26.1 plus new tccboot
<Googulator>if not, something is veeeery wrong
<stikonas>why is that?
<stikonas>tccboot changes might change codegen a bit...
<stikonas>though I think in the end we rebuild tcc-0.9.27 twice
<stikonas>so it shouldn't propagate that far
<stikonas>anyway, I'll debug further tomorrow
<lanodan>Maybe dev86 would work for a x86_16 replacement? Seems to build fine with tcc but haven't thrown real payloads at it yet (likely lilo since it explicitly uses it).
<oriansj> https://xorvoid.com/forsp.html
<oriansj>well found a LISP/FORTH hybrid
<aggi>finally, some little progress with tccboot
<aggi>compilation of it is sensitive to the BUILDHOST used... even when it's isolated by cross-compilation (and -nostdlib/-nostdinc of cause)
<aggi>i moved this setup from BUILDHOST=aarch32 onto BUILDHOST=x86 (slackware11), and a voila, tccboot succeeded compilation of a few linux source files, and then hit a compilation error
<aggi>at least, i got something now, to debug and work with
<aggi>and it is rather concerning, something was/is leaking from BUILDHOST into BUILDTARGET, because i do not see which header or object could have with tccboot
<aggi>furthermore, i noticed some extremely weird error, that was a specific version of tcc.o/tccboot reported a missing symbol for memcpy()
<aggi>depending on how that's approached, it is highly suspicious
<aggi>because, then i quickly tested to supply a memcpy() inside lib.c of tccboot - all of the sudden i386-tcc (depending on version) freaked out with an internal parsing error
<aggi>yet, neither tccboot nor tcc.o supply a memcpy() nor could i understand why adding such a particular symbol would make a compiler freak-out with internal parsing errors
<aggi>renaming memcpy() to whatever my_memcpy() didn't trigger this... so what?
<aggi>there's some incomprehensible magic involved, with the memcpy() symbol for example, and other defects when cross-compiling tcc.o/tccboot
<aggi>something else seems to make a difference: compiling tccboot with i386-tcc itself, or a i686-gcc3 (non-cross), or a i686-gcc6 (cross)
<aggi>which isn't plausible either, because at least recent tcc-version does cleanly separate cross-compilation from the BUILDHOST
<aggi>really confusing, an i am referring to compiling tccboot/tcc.o which seems to break already, with weird side-effects onto kernel-compilation
<aggi>and i suspect, too the AoT kernel-compilation variant with i386-tcc (instead of tccboot JIT) is affected by this
<aggi>some memory alignment problem for example, mis-matched PTR_SIZE, all of the sudden, an interrupt-service routine or anything doesn't match
<aggi>since that much i could diagnose last night again, kernel crashes as soon as an interrupt shall be processed
<aggi>as far as tccboot was concerned, i didn't change anything with my test-setup, other than moving i386-tcc from aarch32 onto a x86 buildhost to compile tccboot
<aggi>a voila, it does at least compile a few kernel c sources and assmbly
<aggi>and i can conclude something else already, that is, even if i succeed with repairing/re-producing tccboot (for JIT kernel compile) or AoT kernel compilation with tcc
<aggi>that's merely a lucky hit, an old slackware11 seems more closely resemble some BUILDHOST leakage into BUILDTARGET that was known-working year 2004
<aggi>and i am not willing to accept such quality issues, when bootstrapping kernel with tcc (and gcc alike)
<aggi>so, sorry for the noise, but this crazyness had cost me a few weeks and month of lifetime
<aggi>the memcpy() issue with tccboot/tcc.o is highly suspicious, and i got a few test-cases for this
<aggi>i see.. usr/src/linux/include/asm/string-486.h
<aggi>i would summarize, defining the task as "kernel bootstrapping": it's compromised for 20 years already
<aggi>besides, thanks for the hint pointing to fiwix, which to my understanding couldn't be compiled by tcc, _and_ loaded and executed by syslinux for example
<aggi>hence the kexec() magic you were discussing
<aggi>question: would tccboot/linux-2.4 have any benefit for #bootstrappable?
<aggi>as far as tccboot itself was concerned, it's #include <linux/kernel.h> etc. inside tccboot.h are a showstopper already, for the reasons mentioned above
<stikonas>aggi: fiwix can be compiled by tcc
<stikonas>kexec is just used because it is better for our purposes
<stikonas>we compile kernel, put it somewhere in the memory and boot
<stikonas>which is much simpler than dealing with bootloaders
<stikonas>bootloaders immedaitely force you to deal with I/O devices...
<stikonas>so you immediately have lots of different drivers (depending on your storage)
<stikonas>for live-bootstrap tccboot/linux-2.4 probably wouldn't have any benefit
<stikonas>since we got fiwix working
<stikonas>but the rule here is, if you enjoy doing something, please do so
<stikonas>so if you like tinkering with tccboot/linux-2.4, then we are not stopping you :)