IRC channel logs

2022-05-14.log

back to list of logs

<janneke>oriansj: we just found out that it's actually unused
<stikonas>fossy: some cleanup for live-bootstrap https://github.com/fosslinux/live-bootstrap/pull/160, swapped sed and make, so we can eliminate another package built with custom kaem scripts (custom makefiles are arguably better)
<stikonas>I'll later do that for tar and gz as well
***nimaje1 is now known as nimaje
<doras>I found the issue that was causing the object files to be ordered differently.
<doras>It's an issue in libtool's m4 scripts.
<doras>Already fixed upstream, too.
<doras>I'll dig up the original commit fixing it and backport it.
<doras>Fixed originally by a Google employee in 2010. Interesting.
<doras>Same guy that wrote the other patch we backported.
<stikonas>doras: alternatively we rebuild m4...
<stikonas>if it's affecting only util-linux
<stikonas>although that depends when it was fixed
<stikonas>m4-1.4.16 needs newer automake
<stikonas>(perhaps we can move automake 1.11 to sysa then)
<doras>stikonas: the fix is in libtool itself. As far as I can tell, it should affect any case where archives are "merged" (use of the automake "_LIBADD=" option). It introduces dependency on the readdir iteration order of a filesystem (through "find"), which isn't guaranteed to be the same. In fact, in my case, it's not.
<doras>It's potentially related to the ordering issue with guile, but I haven't checked yet.
<doras> https://git.savannah.gnu.org/cgit/libtool.git/commit/?id=74c8993c178a1386ea5e2363a01d919738402f30
<doras>The second change in "func_extract_archives" solves our issue.
<stikonas>I see...
<stikonas>well, there is an option to upgrade libtool too
<stikonas>(after sysc we end up with the latest autoconf and automake but somewhat oldish libtool)
<stikonas>if not, then patch should work
<doras>It can hurt us in other cases if we ever introduce something which is affected by it. Probably worth avoiding the reproducibility issue in the first place. Upgrading to a newer libtool can be done anyway though.
***littlebo1eep is now known as littlebobeep
<doras>stikonas: are you familiar with other cases where the order of objects in archives wasn't persistent, in addition to guile?
<doras>I see that guile also uses LIBADD with libguile.a, so my change will potentially fix it as well.
<doras>So just wondering if there are other cases that I should look at.
<doras>Found that we also have a similar workaround for libstdc++.a in gcc-4.7.4.
<doras>I see LIBADD is used for libstdc++.a too, so potentially the same issue there.
<rickmasters>stikonas: i put a debug put_hex on every mes syscall. it does a link syscall fairly early, which is not implemented
<rickmasters>i tried to find that call but i am not familiar with the code base
<stikonas[m]>doras: those are probably all
<stikonas[m]>rickmasters: I can try yo check when I'm back home
<rickmasters>stikonas: ok, i'll keep looking and i can try to output the filenames in the mean time
<oriansj>rickmasters: that would create a soft link to another file; so you would just find that filename and point to it
<rickmasters>orianjs: right, so you are saying get the filenames a create the link manually to see if that helps?
<rickmasters>s/a/and
<rickmasters>oriansj: oops typo'd your username
<unmatched-paren>OrianJS(TM): A javascript engine in hex0! :P
<oriansj>no worries, also you probably should be able to just return 0 as I don't see how mescc would need it given that it leverages mescc-tools for building the final binaries
<oriansj>unmatched-paren: now that would be a cruel parody. I love it
<oriansj>janneke: mescc just builds a tempfile M1 and links with the libc files using M1 (or hex2) right? so it doesn't actually use link itself but mesc.c supports it for any scheme program running on mes.c that needs it right?
<rickmasters>oriansj: hmm, it may be something else, first stab shows null file names. also builder-hex0 already returns 0 for unimplemented syscalls
<oriansj>rickmasters: sounds like a userspace bug passing invalid strings with the syscall
<rickmasters>oriansj: looks like it, but likely caused by prior bad kernel behavior
<rickmasters>oriansj: i'll keep looking
<rickmasters>oriansj: so this is mes-m2, not sure if that matters, honestly not sure of the distinction, i'm running blind but it'll clear up
<stikonas[m]>mes-m2 is mes built with M2-Planet
<stikonas[m]>So it has somewhat simpler code
<stikonas[m]>For some library functions
<oriansj>rickmasters: well the difference between mes-m2 and mes.c is that mes-m2 was a fork of mes.c in a rush to finish the bootstrap of mescc and thus complete the chain. So there are several minor differences and ultimately will be dropped with mes.c being properly bootstrapped by M2-Mesoplanet
<stikonas[m]>oriansj: mes upstream also has mes-m2 binary
<stikonas[m]>Which can run mescc and built slightly faster and more stable mes
<oriansj>that reminds me, I need to setup mes.c for fuzzing
<rickmasters>oriansj: ok. i see, so can mes-m2 run mescc?
<rickmasters>stikonas: looks like you said mes-m2 can run mescc
<janneke>oriansj: can't completely follow, but yes, mescc writes .M1 and linking is done using hex2
<janneke>possibly with a blood-elf phase when using -g
<oriansj>janneke: why would you need the link syscall in the blood-elf phase?
<oriansj>or are we mistaking each other slightly and you were clarifying that you also leverage blood-elf when the -g option is passed.
<oriansj>and that the link syscall isn't needed for mescc to successfully compile C code
<rickmasters>oriansj: i don't think he was talking about file links (i.e the link syscall) but the call binding that hex2 does
<oriansj>rickmasters: I believe you are right
<rickmasters>sorry for the confusion everyone, its not calling syscall link
<rickmasters>i only printed the lower byte, its calling 0x109 which is clock_gettime
<unmatched-paren>i suppose that's for one of those __???__ macros that expands to the date/time/etc?
<rickmasters>main -> init_time -> clock_gettime
<rickmasters>the kernel does not implement that, but not sure if that is the root cause of "assert fail: eval/apply unknown continuation"
<rickmasters>i'll keep digging
<Hagfish>it's an intriguing mystery all right
<rickmasters>Hagfish: yeah, besides that unimplemented call its just allocating memory, opening, reading a file before failing, simple stuff
<stikonas>rickmasters: yes, mes-m2 can run mescc
<stikonas>and yes, clock_gettime shouldn't be in bootstrap kernel, we should keep it as small as possible
<stikonas>(hence stage0-posix does not even have rm/unlink)
<stikonas>possibly it can be replaced with a stup in meslibc
<stikonas>rickmasters: i.e. replace this line https://git.savannah.gnu.org/cgit/mes.git/tree/kaem.run#n92 with this file https://git.savannah.gnu.org/cgit/mes.git/tree/lib/stub/clock_gettime.c
<stikonas>janneke: just spotted a typo: https://git.savannah.gnu.org/cgit/mes.git/tree/lib/stub/clock_gettime.c#n30
<stikonas>"umask stub" -> "clock_gettime stub"
<rickmasters>sitkonas: i can try that, right now i'm on the hunch that mes-m2 reads from stdin even though it has code on the command line, stdin is funky in builder-hex0
<stikonas>not sure, I'm not that familiar with how mes-m2 works
<oriansj>rickmasters: ok for clock_gettime; you should be able to just increment by 1 on each call
<rickmasters>oriansj: ok
<doras>stikonas: it worked. I got identical hashes in all modes.
<doras>This includes the removal of the workarounds for gcc-4.7.4 and guile-3.0.7.
<doras>So it also means we don't have to build guile twice.
<stikonas>oriansj, rickmasters: do we really want to add more syscalls to bootstrap kernel?
<stikonas>I would say keep hex0 kernel as small as possible
<stikonas>doras: ok, good, was it that patch?
<oriansj>stikonas: I agree but it is ultimately rickmasters' decision
<rickmasters>stikonas: it's not my preference until i know what will fix mes, currently pursuing memory issues
<doras>stikonas: yes. I backported it to both of our libtool versions.
<oriansj>rickmasters: well mes is not the sort of program you'll want to support with a clever hack
<doras>But my local branch is still based om mes-m2 because of the SUID issue in mes upstream, which is annoying.
<doras>on*
<oriansj>as mescc and mes.c are very clever hacks in themselves; so you'll want what runs and builds it to be as obvious and robust as possible
<doras>Do you think we could use base our mes in live-bootstrap on the mes commit where this is solved?
<rickmasters>oriansj: i'm hoping its a bug fix, i already found that the hex2 command uses a bad base-address, but that didn't fix it, hoping for something like that
<stikonas>doras: well, I guess we can update, but let me see if I can first do a couple of patches for mes
<oriansj>rickmasters: oh no; mes.c is hard enough to build with GCC and Linux; the M2-Planet build was months of work. it isn't an easy fix sort of program
<doras>stikonas: alright. In the meantime, I'll run another QEMU bootstrap on the tip of live-bootstrap + my fixes and submit a PR or two.
<oriansj>you'll want proper paging, filesystem support and fault tolerant multiprocessing to support mescc
<doras>We could then update mes, and then I'll finally be able to attend to the final changes needed for the bwrap-based bootstrap mode.
<rickmasters>oriansj: yeah, this is mostly just an exploration out of curiosity, i know we need a better kernel to get much farther
<oriansj>I know work was done on https://gitlab.com/myunix/myunix/ to run mescc and tcc (bauen1 correct me if I am wrong)
<doras>Apparently my patch backport changes every single SHA256 hash we have. Oops? :)
<doras>Wait, perhaps not...
<doras>False alarm, it only changes the packages that we expect it to change. I was doing a diff with something else. Everything is fine :)
<oriansj>doras: good to hear
<stikonas>well, now with --update-checksums mode it's easier to deal with changes that change all checksums
<oriansj>janneke: running ./configure for mes I am getting the following missing dependency: guild but it isn't referenced in the mes reference manual so you might want to fix that (either remove from the configure requirements or update the manual)
<stikonas>oriansj: by the way, which guile are you using 2 or 3 for mes?
<stikonas>I think guile 2 doesn't work for me
<stikonas>I can only build mes with configure if I source guix environment
<oriansj>stikonas: I am currently trying to build and test with guile 3.0
<oriansj>yeah, that is an issue
<stikonas>my configure crashes... https://paste.debian.net/1240891/
<oriansj>stikonas: looks like it was expecting M2-Planet to be installed
<oriansj>here is mine: https://paste.debian.net/1240897/
<stikonas>I did add M2-Planet to PATH...
<stikonas>maybe I should install older one from guix
<stikonas>no, same crash...
<Hagfish>was bauen1 around to hear the question about https://gitlab.com/myunix/myunix/ ?
<bauen1>what question ?
<Hagfish>oriansj said "I know work was done on https://gitlab.com/myunix/myunix/ to run mescc and tcc (bauen1 correct me if I am wrong)"
<Hagfish>that's maybe not technically a question :)
<bauen1>oriansj: i only ensured the kernel can be compiled using tcc, iirc it's still missing processes and has a semi-finished vfs implementation
<bauen1>so quite a bit from anything that can run posix
<Hagfish>that's still impressive. i wonder if it's a good stepping stone to follow on from rickmasters's work
<bauen1>and the pmm is still a bitmap, so perhaps you want to start from scratch if you want a kernel compilable by tinycc, but it's not very hard to reach something that can load ELF, handle syscalls, has a vfs and supports a tmpfs ; the things that come after that are a bit harder
<stikonas>well, there is no way tcc can run on builder-hex0
<stikonas>we might not even get mescc working
<Hagfish>hmm, good point
<Hagfish>does there need to be some middle stepping stone?
<unmatched-paren>i wonder if you could compile openbsd or something with mescc or tcc
<unmatched-paren>and whether you could compile linux on openbsd?
<stikonas>well latter definitely yes
<stikonas>not sure about the former
<bauen1>Hagfish: if you want to start solving the "linux kernel problem", i would suggest you to write a simple C kernel (i.e. myunix) that is compilable by tinycc, it can even use all features of C as long as it works with a recent tinycc (and take a look at osdev.org and other hobby kernels like toaruos)
<janneke>oriansj: guile is a requirement and guild comes with guile
<stikonas>janneke: any idea about configure issue with m2-planet ?
<janneke>(when listing gcc as a requirement, you don't mention cpp separately either)
<bauen1>Hagfish: once you have that done, the problem becomes: how do I get tinycc to run without a kernel, or how do I compile the kernel using something that doesn't require a kernel that is this complex
<stikonas>I wanted to do a bit of stuff in meslibc but struggling to configure mes
<Hagfish>bauen1: yeah, that clarifies the problem, thank you
<janneke>oriansj: possibly guild is put in a separate "development" package?
<bauen1>Hagfish: i suspect that for a true hardware bootstrap you will end up with a micro kernel / library kernel that supplies you with memory management, threads, very basic IO where you then add modules as you compile them (including VFS, device drivers, ...) until you can compile a better kernel (e.g. by using tinycc and a POSIX kernel, or you get far enough for linux)
*unmatched-paren cloning openbsd
<Hagfish>oh nice. i like the idea of a kernel extending itself
*janneke looks
<janneke>"<stikonas> "umask stub" -> "clock_gettime stub"" -- thanks, duh
<stikonas>fossy didn't use kernel modules for his kernel build
<janneke>but that's not your question/problem, right ;)
<stikonas>janneke: no, I wanted to do more serious stuff...
<stikonas>possibly add a couple of string functions to libc
<stikonas>my problem right now is https://paste.debian.net/1240891/
<bauen1>Hagfish: there is an earlier version of myunix that could actually run process, had syscalls and a tmpfs + vfs, even a network stack supporting ARP ; but it was written when I was still a few years from 18 iirc, so the code quality is what you expect and it's even more buggy
<stikonas>janneke: possibly guix is too old?
<janneke>hmm, a backtrace
<stikonas>if I do it outside guix, configure succeeds, but my guile is too old
<bauen1>Hagfish: I don't really have the time to work on a kernel myself, but if you start with something I'd love to hear about it (and maybe contribute a thing or 2) :)
<stikonas>maybe the easiest solution would be to install newer guile...
<Hagfish>bauen1: i feel like i at least need to read up on https://wiki.osdev.org/Modular_Kernel
<unmatched-paren>huh, openbsd seems to support wayland now, til
<unmatched-paren> https://github.com/openbsd/ports/tree/master/wayland
<janneke>stikonas: could be that your guix is too old, but why not update guix?
<janneke>you can always use guix time-machine to use the commit you're on right now?
<janneke>anyway, the bug (thanks!) occurs when there is no gcc installed
<unmatched-paren>the openbsd repo is huge
<bauen1>Hagfish: and i really recommend to give https://github.com/klange/toaruos/ a look, i think it should be enough to run live-bootstrap
<unmatched-paren>it contains the full source code of the entire system...
<janneke>stikonas: so your guix shell / guix environment -l guix.scm must have failed?
<stikonas[m]>janneke: yeah, I need to update it...
*unmatched-paren gives up with openbsd
<janneke>stikonas: how did you create the environment in which you run ./configure?
<unmatched-paren>ok, apparently they DON'T support wayland, those packages don't work https://old.reddit.com/r/openbsd/comments/r3h8vw/wayland_in_future/
<unmatched-paren>they seem to have a very condescending attitude towards it
<janneke>(it is improbable your problem was caused by an too-old guix)
<stikonas[m]>janneke: just sourced my guix profile
<stikonas[m]>Anyway you gave me some ideas to try
<Hagfish>bauen1: interesting, thank you. it says a current goal is "get the OS to a state where it is self-hosting with just the addition of a C compiler."
<janneke>stikonas: anyway, fix for the backtrace pushed to master; thanks!
<bauen1>Hagfish: the author klange also hangs around #osdev
<oriansj>bauen1: doesn't sound like they are in a state which can support running a Compiler powerful enough to self-host
<oriansj>hmmm, looks like I am still on the hook for getting gfk off the ground
<bauen1>oriansj: they are
<bauen1>"Previously, with a capable compiler toolchain, ToaruOS 1.x was able to build its own kernel, userspace, libraries, and bootloader, and turn these into a working ISO CD image through a Python script that performed a similar function to the Makefile."
<bauen1>you might have to patch tinycc a bit though
<oriansj>bauen1: well patching compilers and tools chains we are quite good at
<Hagfish>world experts, i'd say :)
<bauen1>in particular take a look at https://github.com/toaruos/gcc/commits/66860610d488c9501b3f0013d599df902fb31bf5 and https://github.com/toaruos/binutils-gdb/commits/facad00e10bb66e7647be3540a33c978721803cb so only minor adjustments to add new targets
<bauen1>there's also https://github.com/kuroko-lang/kuroko by the same author, a dialect of python 3.x which could be intersting to someone
<oriansj>janneke: yeah, no. Guile on debian doesn't include guild and it doesn't look they have it packaged either
<oriansj>yet somehow they do have mescc packaged... hmm
<stikonas>hmm, kuroko might be potentially interesting if it can run python3 bootstrap scripts...
<stikonas>but it's likely that it can't
<bauen1>also keep in mind that that thing has its own (multiboot2) bootloader, so it doesn't depend on grub
<bauen1>or might be multiboot1
<doras>stikonas: would it make sense to increase the number of cores that QEMU emulates? It can save a lot of time at least later in the bootstrap where builds can make use of multiple threads.
<stikonas>doras: none of the builds use multiple threads
<stikonas>everything is single threaded
<stikonas>you can try to play with that and enable it as long as everything is still reproducible
<doras>Really? I could have sworn I've seen multiple cores being utilized when using chroot/bwrap.
<muurkha>multithreading is an easy way for nondeterminism to enter
<doras>muurkha: a bit more challenge, that's all :)
<doras>stikonas: do we force gcc to use only a single thread or similar?
<stikonas>doras: there is no forcing
<stikonas>all compilers are single threaded
<stikonas>including gcc
<muurkha>doras:
<doras>Well, I guess I meant the build system (make or otherwise), not the compiler itself.
<doras>But maybe they default to a single thread.
<muurkha>oops
<muurkha>I meant doras: or a bit less challenge for the hypothetical adversary
<stikonas>doras: yes, they all default to a single thread
<doras>Well, this means that we should optimize our workflow by working on a few things in parallel :)
<doras>The more cores at one's disposal, the more tasks on they should take up.
<stikonas[m]>Well, as long as reproducibility is not sacrifused
<doras>I highly recommend using git worktrees, by the way. For those not familiar with the feature, it allows checking out separate branches into separate directories that are still linked to the same git database. It allows you to work on multiple local branches as usual, but with complete parallelism. For example, you can run a length bootstraps on one local branch while performing git operations or code changes on another local branch.
<doras>I have two live-boostrap builds running in parallel at the moment, and I'm working on preparing a third branch in parallel.
<vagrantc>yeah, worktrees are pretty awesome
<muurkha>you don't need a special git feature for that, you can just make two clones of a bare checkout
<vagrantc>it wastes diskspace for no particularly good reason and you get less accurate disk caching
<muurkha>it uses the same amount of disk space
<muurkha>to within epsilon
<muurkha>try it
<muurkha>at least as long as they're on the same Unix filesystem
<vagrantc>git clone foo.git 1 is ~400MB, git clone foo.git 2 is ~100MB ... git worktree 3-worktree is ~100MB and git worktree 4-worktree is ~100MB
<vagrantc>hardlinks?
<vagrantc>looks like
*vagrantc still finds worktrees easier to work with...
<vagrantc>yeah, deleting 1 then du -smc 2 3* 4* ... shows 2 as having 400MB
<doras>stikonas: https://github.com/fosslinux/live-bootstrap/pull/162
<muurkha>yeah, you can see the link count go up on files inside .git/objects/*
<muurkha>but you do have an extra push/pull step that I think worktrees avoid
<stikonas[m]>You don't need work trees to run 2 bootstraps. rootfs.py takes tmpdir argument
<muurkha>oh thanks stikonas[m]
<doras>stikonas: I always feel bad changing things while a build runs. For live-bootstrap it should be fairly safe because everything is copied, but it's not the same for every project.
<muurkha>yeah, you might make the build work by changing just the right thing at just the right time
<muurkha>and then never be able to figure out what you did
<doras>Or fail :)
<muurkha>yeah, but that's less scary
<muurkha>just annoying