IRC channel logs

2022-09-16.log

back to list of logs

<oriansj>I guess I could make M2libc more standards compatible by having errno and actually setting it
<muurkha>errno can be pretty handy for debugging
<rickmasters>oriansj: it would be easier to just map any negative eax to -1 in the assembly
<rickmasters>i'm not against errno but it appears we've been working without it
<stikonas>but it we do it in C code then it works for all arches rather than having to code it n times
<oriansj>well right now we don't have a fcntl.c
<oriansj>and fcntl.h just loads the $arch/linux/fcntl.c
<stikonas>and I'm still busy updating kaem-minimal for uefi...
<rickmasters>stikonas: the implementations of open are all arch-specific right now
<oriansj>so if I just make a fcntl.c, I can do open in the .c and the $arch/linux/fcntl.c can be converted to using _open
<stikonas>rickmasters: oh ok
<oriansj>yeah, $arch/linux/fcntl.c is just basically a collection of asm() blocks inside of functions with the desired names
<rickmasters>oriansj: integrating open with C in a arch independent way exceeds my knowledge
<oriansj>I can handle that
<oriansj>have to change a couple M2-Planet -f bits but it'll work without much change
<rickmasters>oriansj: so would you make an arch independent open that calls arch-specific _open and then handles return values?
<oriansj>yes and sets a global errno variable
<oriansj>which I guess I can define in the fcntl.c file until we plan on doing a more proper errno.h library
<muurkha>IIRC the Linux syscall interface pretty much works that way too; system calls return negative (negated) errno values on error, and libc takes care of testing that and storing the (absolute) error number into (typically thread-local) errno
<oriansj>and M2libc updated
<rickmasters>oriansj: thanks!
<oriansj>(i'll be updating stage0-posix shortly)
<rickmasters>oriansj: do you need to remove open from <arch>/linux/boostrap.c?
<oriansj>I converted it to _open
<rickmasters>yes, but open is now in M2libc/fcntl.c and also still in <arch>/linux/bootstrap.c
<oriansj>one never uses fcntl.c and <arch>/linux/bootstrap.c at the same time
<oriansj>as it is to solve the lack of octal support in cc_*
<rickmasters>oriansj: ok
<rickmasters>stikonas: live-bootstrap errors out on automake-1.11.2: configure: error: Autoconf 2.62 or better is required.
<stikonas>rickmasters: that might be fixed by updating submodules
<stikonas>a few months ago I've reworked autotools, so now they are always versioned
<stikonas>rickmasters: I assume this is with linux and not builder-hex0
<rickmasters>I had done git submodule update --init --recursive, right after clone
<stikonas>hmm
<rickmasters>yes linux
<stikonas>maybe some race condition
<stikonas>automake 1.11.2 is definitely built after autoconf 2.64
<rickmasters>i'm running from a fork from today, i'll try the main repo. will take a while...
<stikonas>rickmasters: any idea which step?
<stikonas> https://github.com/fosslinux/live-bootstrap/blob/master/sysc/automake-1.11.2/automake-1.11.2.sh
<stikonas>is it main configure?
<stikonas>sometimes make step also reruns configure
<rickmasters>i think its the second configure
<rickmasters>its ten checks after: running CONFIG_SHELL=/bin/sh /bin/sh ./configure --prefix=/usr --no-create --no-recursion
<stikonas>well, I've just successfully finished running live-bootstrap
<stikonas>though in ./rootfs.py -bw mode (rootless)
<rickmasters>it's strange, it appears it is actually running autoconf-2.64
<rickmasters>automake-1.11.2: compiling source.
<rickmasters>CDPATH="${ZSH_VERSION+.}:" && cd . && autoconf-2.64
<rickmasters>then: /bin/sh ./config.status --recheck
<rickmasters>running CONFIG_SHELL=/bin/sh /bin/sh ./configure --prefix=/usr --no-create --no-recursion
<stikonas>hmm, if you are still in that system
<stikonas>you can try to run that make command and see if it is reproducible
<rickmasters>sorry, I already restarted it but I'll check that when I come back from dinner
<oriansj>wow, #warning support wasn't commited yet >.<
<oriansj>well free C features today too I guess
<oriansj>well looks like https://savannah.gnu.org/ is down
<pabs3>wget eventually connects, so probably just loaded
<oriansj>git however is timing out
*pabs3 poked on #savannah
<rickmasters>stikonas: same error on new build and running make reproduces it
<rickmasters>autoconf-2.64 exists but running autoconf is command not found
<fossy>hmm, i wonder if we could drop /usr/lib/musl entirely, i think the original reasoning was seperation from mes libc, but it doesn't really matter since we compile everything statically, and by the time anyone actually cares about there being junk in /usr/lib, it's all gone..
<doras>fossy: I'm actually working on a few PRs that will end in using /usr/lib/musl/i386-linux-musl.
<doras>Regardless of the functional reason for this (some projects like cpython expect libraries to be in either `/lib` or `/lib/<triplet>`), I personally like the explicit nature of this naming convention.
<doras>stikonas, rickmasters: there's some sort of issue (race condition?) in live-bootstrap at the moment when building automake-1.11.2. I get the failure you mentioned as well around ~30% of the time in the bwrap bootstrap mode, and around 5% of the times in the qemu bootstrap mode.
<doras>I haven't got around looking at what's causing it, but it's indeed quite annoying.
<doras>The issue seems to start with "./config.status --recheck" being triggered during the compilation. For some reason it doesn't always occur.
<doras>For reference, this is how a successful compilation starts: https://gitlab.gnome.org/-/snippets/4155/raw/main/automake-1.11.2-build.log
<doras>No "CDPATH="${ZSH_VERSION+.}:" && cd . && autoconf-2.64" nor "./config.status --recheck".
<doras>I updated the snippet to include the entire build output.
<doras>stikonas: unsurprisingly, not exporting the variables did result in some package hash changes, and also in a removal of at least one workaround we previously needed.
<doras>I'll need to run diffoscope to figure out what changed exactly and whether any of these changes are undesirable.
<rickmasters>doras: thanks, that's very helpful info
<fossy>doras: i like the triplet, i think, however i think the /musl is unnessecarily descriptive. there isn't anything to really make explicit and it's non-standard
<fossy>doras, rickmasters: do either of you have a log of a failed automake 1.11.2 build? i have never seen this issue
<stikonas[m]>I have not seen it either
<doras>fossy: sorry, I mistyped above. I meant that I'm working towards having `/usr/lib/i386-linux-musl`.
<doras>fossy: I bet I'll see this failure soon enough. I'll share the log when I do.
<doras>stikonas: removing the environment variable export seems to be a net win. It solved a few issues and exposed a few others.
<stikonas>doras: ok, that's good
<stikonas>what are the solved issues?
<doras>stikonas: it removed a bunch of files that we currently package that were never supposed to be installed/packaged. Build time files.
<stikonas>oh ok
<doras>And allowed to remove workarounds that we had to delete some of those in some packages.
<stikonas>yeah, I remember we occasionally had some weirdness with that
<doras>I bet there are more workarounds that I didn't find that are no longer necessary.
<fossy>doras: oh, that makes much more sense! sounds good
<stikonas>ran automake 1.11.2 a few times, still no error...
<stikonas>I think people who can reproduce it (so not fossy or me) will have easier time debugging and fixing it
<Hagfish>thanks for your dedication. non-reproducible bugs are maybe even worse than non-reproducible builds :)
<stikonas>oh they are not that uncommon
<stikonas>we had plenty of them in live-bootstrap
<stikonas>sometimes they were indicator of pregenerated files
<Hagfish>oh, if they lead to discovering pregenerated files then maybe we can forgive such bugs
<rickmasters>fossy, stikonas: Log of the automake build failure: https://gist.github.com/rick-masters/49c2be35a2165634698a50ee65d4ed6a#file-automake-log
<stikonas>rickmasters: try to run make again there, see if it's not in bad state
<stikonas>and reproducible
<stikonas>if it is, then see if adding AUTOCONF=autoconf-2.64 helps
<rickmasters>stikonas: running make again fails with the same output, starting with /bin/sh ./config.status --recheck
<rickmasters>running AUTOCONF=autoconf-2.64 make fails later with a complaint about makeinfo: command not found
<doras>rickmasters: the last one is expected. Use "MAKEINFO=true" too.
<doras>(as done in sysc/automake-1.11.2/automake-1.11.2.sh)
<rickmasters>yeah, I tried that already but unfortunately my build is in a strange state now
<rickmasters>i saved the build directory with tar so I could try more aggressive commands, but when I restored it, it wasn't restored to the same state
<rickmasters>seems like tar changed something but I'm not sure, I may have to start over
<rickmasters>looks like tar did not save the file aclocal, I think
<stikonas>anyway, it looks like adding AUTOCONF=autoconf-2.64 before make should fix the problem
<rickmasters>I started over and it failed again. I'm 3 for 3. Takes 43 minutes.
<doras>I only ever had it fail two times in a row. I guess your system reproduces it even more often than mine :)
<rickmasters>stikonas: so, are we thinking this is the command to restart?: AUTOCONF=autoconf-2.64 make MAKEINFO=true
<rickmasters>if that works, its an interesting data point, but the root cause is still a mystery...
<stikonas>most likely triggered by some differences in timestamps
<stikonas>and some make rules are triggered only sometimes
<oriansj>non-deterministic behavior when not using /dev/random or /dev/urandom should just be a bug
<muurkha>that is, invoking time() or gettimeofday() or examining atime, ctime, or mtime should be a bug?
<muurkha>well, I guess there are other ways to get non-deterministic behavior, so that is not an exhaustive list
<stikonas>yes, you can't avoid things like stuff takes different time to execute
<stikonas>and make is definitely using timestamps
<muurkha>some implementations of redo use secure hashes instead of timestamps
<muurkha>nondeterministic execution time doesn't have to result in nondeterministic execution results
<stikonas>yes, but nondeterministic execution time results in nondeterministic timestamps
<stikonas>mostly observed in autotools projects though....
<stikonas>and especially autotools itself
<doras>And nondeterministic timestamps apparently result in nondeterministic build failures :)
<aggi>it's confusing too, autoconf/automake versions are not backwards compatible
<doras>I actually have a commit that adds `AUTOCONF=autoconf-2.64` to the `make` command to work around the issue, but since it felt mostly like a work around I didn't test it thoroughly.
<doras>like a workaround*
<aggi>had encountered various weird side-effects, and concluded, away with it, no more GNU-buildsystem and GNU-toolchain
<aggi>scraped various old ebuilds from the gentoo archives, didn't expect an autoreconf -if caused that much trouble
<aggi>furthermore, the logfiles often, do not reveal precisely what's broken and when and why, with libtool involved etc.
<aggi>toybox doesn't need any autoconf/automake, suckless components don't need it
<muurkha>autoconf is probably a lot less important now than it was 30 years ago when I started using it
<aggi>how so? gnu-toolchain (gcc/binutils) does need it
<muurkha>in 01991-01992 I would frequently use, in the same day, two or more of SunOS 4, MS-DOS, VAX VMS, Solaris (SunOS 5), Ultrix, AIX, IRIX, Linux, and BSDI BSD/OS
<aggi>and i mean, the patchset required, to rollback to gcc47 with gentoo... and see autoreconf -if succeed...
<muurkha>the problem autoconf solves was that if the program's author was using SunOS 4 and you were using Solaris (much less IRIX) you would generally have to hack the source for a while to get it to compile
<aggi>that's why, i don't accept any GNU-toolchain/build-system with my cross-compile/bootstrapping setup anymore
<rickmasters>sitkonas: I suspect it is timestamp related as well. Older file systems only have 1 second precision. I'm runnning on a 4Ghz machine.
<aggi>tcc-toolchain integration, still is, incomplete, guess why? GNU.
<rickmasters>stikonas: if make decides to re-run configure, it won't have all the env vars set for configure, in automake-1.11.2.sh
<aggi>muurkha: with autotools, the cure is worse than the problem solved with it, nowadays.
<stikonas>yes, that I understand
<muurkha>nowadays basically only GNU/Linux, interchangeable *BSDs, and occasionally Cygwin are relevant platforms, and old machines with outdated OSes are much less common, so automating away that porting effort is much less important
<muurkha>so that's the sense in which I mean that autoconf is a lot less important
<stikonas>rickmasters: but alternative would be exporting them and that woud propagate...
<stikonas>I don't think doras would like that...
<muurkha>arguably the cure was *always* worse than the disease ;)
<muurkha>oh I should have mentioned MacOS, iOS, Android, and Win64, which I guess are things that a lot of people do want stuff to compile on
<rickmasters>stikonas: i think you prefix the same env vars to make MAKEINFO=true as are prefixed to the configure command in automake-1.11.2.sh
<muurkha>but autoconf doesn't actually help much with Android or Win64 (as opposed to WSL)
<stikonas>rickmasters: yes, that we can obviously do
<rickmasters>i can test that.
<rickmasters>i've only found sed to perform the edit, but that should work
<stikonas>rickmasters: I was just saying that we can't export it once at the beginning of the build script. That would require export and ideally then you would want to build each script in its own subshell. But subshells are a bit broken in early bash
<stikonas>yeah, we don't have editors...
<rickmasters>right, no export. the vars are there for configure so whats good for the goose is good for the gander
<rickmasters>test started
<rickmasters>also: running the command manually seems to have worked!
<rickmasters>AUTORECONF=autoreconf-2.64 AUTOM4TE=autom4te-2.64 AUTOHEADER=autoheader-2.64 AUTOCONF=autoconf-2.64 make MAKEINFO=true
<rickmasters>I need to step away for an hour or so but I'll report back later with results of testing a full build
<rickmasters>The automake build succeeded on the full build, which is still running. I'll create a PR with that fix.
<stikonas>doras: actually, I've noticed that all new builds after musl are now dynamic
<stikonas>I think fossy would like to only use dynamic linking where it is strictly necessary
<doras>stikonas @stikonas:libera.chat: perhaps we're missing --disable-shared in some packages?
<stikonas>yes, I think all binaries after musl
<stikonas>well, for libraries I think we can build both .a and .so
<stikonas>doras: I'm now trying with the new package that I added (CFLAGS="-static" seems to help)
<stikonas>so I think the same would work with others, I can probably fix it...
<oriansj>muurkha: well with tools like cmake, I see even less reason for autotools to be used outside of a few niche environments like Z/OS which are still around and kicking.
<oriansj>although I doubt I can afford to do a stage0-Z/OS port given the current IBM license bit
<vancz>out of context question; can kaem set environment variables
<oriansj>vancz: kaem-optional-seed => No; kaem => yes
<vancz>aha
<muurkha>oriansj: yeah, agreed. and I'm not sure autotools really helps much with zOS
<vancz>oriansj: context is I'm wondering why this was done but I don't expect you to know https://github.com/ngi-nix/mes/blob/4a787cdade16b01f705649dbb0c48c9c47e89247/stage5/mes-m2.nix#L61
<oriansj>muurkha: actually most programs running on a Z/OS system these days are POSIX binaries using the POSIX subsystem (which has a boatload of old unix vulnerabilities)
<vancz>I guess environment manipulation may be difficult in general in this context?
<oriansj>vancz: I get a 404 error with that link
<vancz>Oh oops
<vancz>I dont think I can change that right now
<vancz>oriansj: from line 50 https://bpa.st/2D2Q
<oriansj>vancz: not really; in fact you just need to do: https://github.com/oriansj/stage0-posix-x86/blob/master/kaem.run#L23
<vancz>hm ok, giess I'll ask
<vancz>*guess I'll ask the author
<oriansj>vancz: looks like the author was having a real hard time trying to make mes.c and basically dumped the mes-m2 block in
<stikonas>maybe mes is not the easiest thing to build, we now have plenty of examples now how to build it...
<stikonas>s/we/but we/
<oriansj>and M2-Planet is far from the easiest C compiler to work with.
<oriansj>and i still never put enough time into making M2libc Mes.c compatible enough
<oriansj>as there isn't a big gap between M2libc and meslibc in terms of functionality (needed to build mes.c)
<stikonas>yeah, it would be easier if M2libc + M2-Planet could build mes.c
<stikonas>since now we need to basically port part of meslibc to M2-Planet anyway
<stikonas>so need to write another libc entry point, etc...
<oriansj>I really should just kill a weekend and knock that out
<doras>stikonas: I'm checking whether the addition of --libdir to dhcpcd somehow broke the `qemu` bootstrap. I keep getting "Timeout reached for internet to become accessible".
<doras>Just a heads up if you run into a similar issue.
<stikonas>sure, so far I haven't seen it
<muurkha>oriansj: that'd be super cool!
<stikonas>it would be even nicer if we could just have one libc but that's probably wishing too much
<oriansj>muurkha: although the fact that gcc fails to build with simple.sh in the latest commit of mes concerns me
<oriansj>gcc (GCC) 12.1.0 for reference by the sway
<oriansj>^sway^way^
<oriansj>but I probably should just use the mes-m2 version in live-bootstrap because that will work and is a fixed target
<doras>stikonas: I'm testing without --external-sources, are you?
<doras>I usually use --external-sources, so it's possible that this issue was introduced even earlier.
<stikonas>doras: I tried both today
<stikonas>speaking of --external-sources, I like to run with it too
<stikonas>maybe it should be default...
<stikonas>fossy, what do you think?
<stikonas>though I don't really like external sources name either
<stikonas>it's not very descriptive
<doras>stikonas: do you remember on top of which commit? I want to test a good one to see that the failure isn't some strange internet connectivity issue.
<stikonas>doras: on top of current master
<doras>Oh.
<stikonas>so with all PRs merged in
<stikonas>(I'm trying to add another package)
<doras>Then something is definitely off here...
<doras>Ah
<doras>I think I have a DNS issue
<doras>The IP address of example.com can't be resolved, and apparently this is the domain we probe in live-bootstrap to detect internet connectivity.
<stikonas>doras: is that just your DNS server?
<stikonas>what about some other one e.g. dig @8.8.8.8 example.com
<doras>stikonas: `systemctl restart systemd-resolved.service` solved it :\
<stikonas>ok, so something local
<doras>Yes... a bug, it would seem.
<muurkha>ugh
<vancz>oriansj: so I apparently the complexity of the file I linked is supposed to come from the fact that if we want to use mes/etc to bootstrap , we cant use the nixpkgs stdenv
<vancz>now, I cant stop thinking that there has to be a way to do this without duplicating a bunch of code from upstream (?) or replicating a bunch of stuff that live-bootstrap does, because it wont be maintained well, if at all
<vancz>I cant figure out how to pose the question well, but the bottom line is - isnt there a way for your project, that has all the knowhow to actually write these bootstrap processes, to maintain such scripts
<vancz>or is that already a thing and we're really missing something here
<stikonas>well, given that live-bootstrap does not duplicate bunch of code from upstream, I think it must be possible for other projects to avoid duplication too
<stikonas>as long as they can unpack some files into certain directories and run a single command
<vancz>I mean, it looks to me like live-bootstrap has all these kaem scripts that actually do the compilations and such
<stikonas>exactly
<vancz>so is there / is live-bootstrap the canonical implementation of how to bootstrap an environment?
<oriansj>and there has to be a 1:1 translation between kaem and nix
<stikonas>live-bootstrap is not canonical implementation, just one attempt at it
<oriansj>vancz: right now it is the only pure implementation of bootstrapping an environment
<vancz>is there anything like a canonical implementation?
<vancz>oriansj: ok
<stikonas>yes, it is the only one...
<vancz> I'm trying to figure out how to avoid writing our own
<stikonas>at least the only one working
<oriansj>but not the only one (for example the guix bootstrap is another)
<oriansj>but guix allows pregenerated files and has the guile binary blob in its root
<stikonas>well, guix bootstrap is another bootstrap but starting from different starting point
<vancz>oriansj: e.g, I think you linked me this as a reference https://github.com/fosslinux/live-bootstrap/blob/master/sysa/mes-0.24/mes-0.24.kaem - it would be nice if we didnt need to write something that does the same thing ourselves
<stikonas>vancz: feel free to refactor this a bit
<vancz>Im mostly acting elephant in chinashop
<stikonas>split out the reusable stuff from live-bootstrap specific stuff (unpacking, etc)
<oriansj>there is also: https://github.com/t184256/bootstrap-from-tcc https://github.com/schierlm/FullSourceBootstrapFromGit/
<oriansj>and probably more I don't know about
<stikonas>oh yes, mihi was also playing with it
<stikonas>I think it was mihi that wanted to refactor mes build script in live-bootstrap to make it more reusable
<stikonas>though I guess mihi didn't find time for that
<stikonas>but on the other hand we got autogen bootstrap
<stikonas>(which I still need to finish checking)
<oriansj>and a guile bootstrap
<stikonas>yes, guile bootstrap too, though that is earlier
<vancz>this whole stack seems rather involved so it just seems to me like its going to take a long time for someone not deep into it to figure it out
<oriansj>but unfinished projects are kind of natural while exploring the solution space
<vancz>(and end up getting neck deep in it in the process)
<stikonas>vancz: most of the things are fairly simple
<stikonas>especially live-bootstrap
<stikonas>it's mostly just a bit of scripting
<vancz>ok
<oriansj>vancz: you don't have to figure it out, you can talk to us and we will help you understand
<vancz>my experience so far has mostly been "m2planet is sensitive with the header / c files"
<stikonas>stage0 stuff might be slightly more involved, as it requires assembly
<vancz>thoguh that can arguably be user error, but the issue is more: "ok i need to figure out all these h and c files i need to pass to the compiler"
<stikonas>we do have M2-Mesoplanet now too which is a bit smarter with header / c files
<stikonas>but I don't think anybody have tried to use it for mes
<vancz>I keep telling my peeps to come here and talk to yall but *shrug*
<oriansj>well that is because M2-mesoplanet doesn't work with meslibc and M2libc isn't up to speed yet
<vancz>I'm just trying to figure out how I can make this whole thing easier to get through (for them) because I don't have any extra capacity
<oriansj>but i guess I really should just get off my arse and solve that one once and for all
<vancz>I hop in with helping debug sometimes
<oriansj>vancz: we all do what we can, and sometimes even the smallest bit of help really matters some days
<stikonas>vancz: see how M2-Mesoplanet compile command is much simpler https://github.com/oriansj/mescc-tools-extra/blob/e1e501c4a9811242b8c2c09709834924389f248c/mescc-tools-extra.kaem#L29
<oriansj>or saying, this failed and some information for others to figure it out is very useful
<oriansj>the first step is always get the program into a state which GCC can build the program in a single command
<oriansj>after that M2-Planet conversion becomes a great deal simpler
<stikonas>and it is much simpler than C-> hex0 conversion
<oriansj>stikonas: hopefully you mean C->asm->hex0
<oriansj>because there is no way I could make that leap, without first working out the assembly in a reasonable assembly language first
<stikonas>yeah, I'm now usually doing C->asm (GAS/NASM syntax) -> M1 -> hex2 -> hex0
<stikonas>so all tiny steps...