IRC channel logs

2022-12-18.log

back to list of logs

<stikonas>sam_: I don't know whether it's better to put gentoo integration into its own repo or somehow integrate as an optional part of live-bootstrap
<fossy>i feel like LFS wouldn't mind a reference to live-bootstrap somewhere in the book
<fossy>oh, oriansj talked to them lol
<fossy>i strongly doubt they would want anything much more than that, since they are reasonably fixed on their one process
<fossy>stikonas: my thoughts for integration was having live-bootstrap finish on a hook, and then people can chuck a variety of things onto the end, so imo integration fits better into its own rep
<stikonas>well, hook is already there https://github.com/fosslinux/live-bootstrap/blob/master/sysc/after.sh
<fossy>well, yes, but i was thinking something a little bit more sophisticated, possibly reading from a different disk/tarball/etc, so it exists in a little more isolation from live-bootstrap (and you don't have to wrangle the initramfs/disk)
<sam_>this is where I need to learn more about how live-bootstrap works to answer properly
<sam_>I don't get how the integration/incorporation could work yet, but I want to know & am interested in it
<stikonas>that's a bit of chicken and egg problem
<stikonas>hard to tell what to do until somebody starts integrating
<fossy>effectively live-bootstrap is IN: binary seed + kernel (currently hex0, kaem, linux, in the hopefully near future builder-hex0) and source code. OUT: a modern C/C++ toolchain with an accompanying linux system that can be used to build packages.
<fossy>to be fair, that is true, that it is reasonably hard to tell, but at the most basic level; (taking the example of gentoo), live-bootstrap's linux system would be used to build up a stage3
<fossy>how that works more precisely i'm not really sure
<stikonas>yes, stage3 is a sensible point (not necesserily in tarball form, can be running system) but the question is indeed how to transition from live-bootstrap to gentoo steps
<stikonas>and also some people asked for nicer format of OUT (rather than just a running shell)
<sam_>stikonas: we also have bootstrap-prefix.sh which kind of has an interesting intersection with this
<sam_>it goes from a system with some basic utilities -> stage3
<sam_>(but it's for gentoo prefix)
<stikonas>well, gentoo prefix is something you can chroot into
<stikonas>I did hear about bootstrap-prefix.sh but haven't used it...
<stikonas>I once bootstrapped prefix with emerge itself without scripts, but that was somewhat tricky (build failures, etc...)
<sam_>yeah, we need to document how to do that a lot better
<sam_>someone was interested in doing it reently
<sam_>bootstrap-prefix.sh is neat but I would prefer to be able to do it raw from an existing system with emerge
<sam_>to not rely on trusting newly downloaded sources
<stikonas>I think I had to try to build everything with minimal flags but then I had to do something manual to be able to chroot (possibly adjust some links not to point to host system)
<stikonas>after that I was able to chroot in
<stikonas>anyway, that's still some time away
<stikonas>until we can do gentoo integration
<stikonas>I wouldn't be surprised if we first get fiwix bootstrap working
<fossy>prefix does seem like the most sensible path
<fossy>sam_: how do you build the usual stage3s?
<stikonas>i think there a tool to build them on the existing Gentoo
<fossy>i see
<stikonas>(catalyst)
<oriansj>sam_: interesting material on top of your channel
<rickmasters>War story ahead...
<rickmasters>I finally tracked down the root cause for autoconf-2.64 failing on Fiwix.
<rickmasters>(An autoconf macro was failing so I had temporarily hard-coded the right value.)
<rickmasters>Turns out autoconf-2.64 failed because autoreconf-2.61 failed on configure.ac
<rickmasters>while running aclocal-1.10 which launched autom4te-2.61 which failed running m4-1.4.7
<rickmasters>to expand the AM_INIT_AUTOMAKE macro because AC_INIT had previously failed to set
<rickmasters>the package version because the AC_INIT macro used the m4_esyscmd macro to
<rickmasters>launch a script to parse the version, but m4_esyscmd was a built-in function in m4
<rickmasters>which used popen to read from the script but the file descriptor popen returned
<rickmasters>couldn't be read from because musl implemented popen with posix_spawn which
<rickmasters>created a pipe with O_CLOEXEC but there was a bug in Fiwix with that pipe option
<rickmasters>that incorrectly altered the flag on the underlying pipe instead of the file descriptor,
<rickmasters>causing the read end of the pipe to be write-only (because FD_CLOEXEC == O_WRONLY).
<rickmasters>That was probably the most layers I've had to work through so far but at least the fix was easy!
<muurkha>haha holy shit
<muurkha>congratulations!
<rickmasters>thanks muurkha
<oriansj>rickmasters: good work
<oriansj>another reminder of how hard kernel work can be
<Christoph[m]>Wow! WOW!
<muurkha>I think that wasn't kernel work that was the problem mostly
<muurkha>even though the final bug was in the Fiwix kernel
<oriansj>well indirect kernel work, as things have to work well enough to figure out what went wrong with the kernel code.
<rickmasters>It's nice to have the source code of all layers. Ultimately I diagnosed the read error in m4 by instrumenting the read syscall rather than the application.
<rickmasters>I modify execve to look for the program in question and set a variable debug_pid and then I can catch syscalls from that pid
<rickmasters>(I mean I modified sys_execve in the kernel)
<oriansj>so i am guess no instrumentation for debugging out of the box
<rickmasters>oriansj: i'm not using a debugger if thats what you're referring to
<oriansj>I was thinking more auditd logging with additional debug info
<oriansj>ah la dtrace
<rickmasters>autoconf had a debug log but it didn't help much because the failure was too low level
<oriansj>yes, that I gathered. I was just asking if Fiwix did not have kernel hooks available at runtime to collect that information and it resulted in you having to do a custom build to get that information
<rickmasters>Oh, I see. Fiwix has a __DEBUG__ macro that operates system wide but it's way too verbose and extends run time like 10X
<rickmasters>So, I'll define it per syscall, but even that is too much - so yeah writing code to filter output was needed
<rickmasters>By the way, Fiwix compiles in two seconds (with parallel make) so that's nice
<rickmasters>And its a fraction of a second if its just a file or two.
<stikonas>oh good, another fiwix pitfall solved
<rickmasters>to clarify, __DEBUG__ is a preprocessor variable used with #ifdef __DEBUG__ around code with printk
<rickmasters>Thought I'd follow up a loose end: m4 uses getc to read from popen and stops on EOF but does not check ferror/feof to see whether EOF was caused by an error.
<rickmasters>when getc returns EOF, that's not necessarily end-of-file. TIL
<rickmasters>If m4 had reported an error - EBADF in my case, that probably would have saved me a lot of time.
<rickmasters>It just silently returned an empty string to the next layer up.
<stikonas>it might very well be due to us using old versions
<stikonas>some of these bugs might be fixed in new m4
<stikonas>hmm, hitting many of my bugs in m2libc on uefi :(
<stikonas>after I've added reading of current environmental variables, started hitting lock up (I guess either infinite loop, or something deeper in UEFI) somewhere inside _free_allocated_memory() that I call on program exit
<rickmasters>stikonas: yes, just checked. m4_esyscmd is much more thorough in m4-1.4.19 and checks ferror and would report "cannot read pipe"
<stikonas>but it's also much harder to build on minimal system that we have at the beginning
<rickmasters>Not surprised. I find that most errors I track down for days are fixed in the latest version. :)
<stikonas>which shows that software quality has improved a lot over the years but bootstrappability got much harder
<rickmasters>stikonas: have you setup building UEFI so you can debug in that code if necessary? Probably not easy, but may pay off...
<stikonas>rickmasters: no, I haven't...
<stikonas>I've been mostly debugging with printing things, early exit, commenting things out, etc...
<stikonas>so even with this I have already learned a bit about current issue
<stikonas>commenting out _free_pages() call in _free_allocated_memory() function does not help. Which means something is going wrong in my while loop
<stikonas>it never finishes
<stikonas>possibly caused by some corruption somewhere (out of bands write...) that overwrote memory tracking pointers
<stikonas>yes, gdb would be nice to have for debugging
<stikonas>although M2 compiled applications are not easy to debug in gdb either
<stikonas>as debug symbols are only written up for function names
<rickmasters>I ended up building my own qemu to help figure out some problems which took many hours of futzing with build dependencies, but frankly it didn't help as much as I'd hoped...
<rickmasters>But it did help.
<stikonas>well, we'll see how long I spend on this issue
<stikonas>(as I just started encountering it)
<rickmasters>"random" memory corruption is one of the more difficult kinds of problems.
<stikonas>yeah, though I have some leads
<stikonas>printed size of what I'm calloc'ing and it's not always what I would expect