IRC channel logs

2024-08-15.log

back to list of logs

<mid-kid> https://mid-kid.root.sx/git/mid-kid/bootstrap/src/branch/master/gentoo-2024.8/gentoo.txt
<mid-kid>Alright, I consider this document complete now
<mid-kid>At some point I might want to fix some things in live-bootstrap and gentoo individually to remove some unnecessary steps
<mid-kid>Stuff like figuring out why `make -j` isn't working within the emerge environment without rebuilding "make"
<mid-kid>I'd also want the bzip2 package in live-bootstrap to install the libbz2 library
<mid-kid>And figure out what's up with live-bootstrap's version of `find` lacking a bunch of command-line siwtches
<mid-kid>And I think live-bootstrap would be better off using pkgconf rather than pkg-config, would avoid having to rebuild it here too.
<mid-kid>I also think it'd be a good idea to install `meson` and `ninja` in live-bootstrap. More and more programs are starting to require it.
<mid-kid>As for the gentoo side of things, I've reported two bugs I encountered during the bootstrap: https://bugs.gentoo.org/937637 and https://bugs.gentoo.org/937918
<mid-kid>fixing those would help remove a few lines but nothing big
<mid-kid>And I also want to figure out how to checksum and make sure everything's reproducible.
<mid-kid>It'd be great to end up being able to just create a bunch of reproducible stage0 tarballs for different architectures that people can build new gentoo installs from.
<Googulator>mid-kid: I'll need to verify this with kernel bootstrap :)
<Googulator>for https://mid-kid.root.sx/git/mid-kid/bootstrap/src/branch/master/gentoo-2024.8/gentoo.txt#L18 it's enough just to back up /external/repo
<Googulator>& then you can use the --repo and --early-preseed options (early preseed is repo/base.tar.bz2)
<mid-kid>yeah I figured that one out but I still found it more useful to create a tarball to quickly start over
<Googulator>also, #gentoo-releng might be interested in this too
<mid-kid>mayhaps
<mid-kid>I wanted to ask them about how to nicely resolve the app-alternatives kludge I have there near the end (really how does catalyst do it)
<mid-kid>but rn I just fell down this rabbit hole because I got a new computer and wanted to install gentoo like this on a new machine :^)
<mid-kid>so it might be a while before I continue this work again
<Googulator>Another interesting direction to go is rather than just installing gentoo locally, try to generate stage and livecd images (which I did with the old, LFS-based bootstrap)
<Googulator>Doing that on bare metal with kernel bootstrap amounts to completely breaking the binary propagation chain from prior Gentoo versions
<mid-kid>I have some catalyst instructions here: https://mid-kid.root.sx/git/mid-kid/bootstrap/src/branch/master/gentoo-2024.8/catalyst.txt
<mid-kid>But like
<mid-kid>Right now I'm going with the assumption that you'll get an x86_64 kernel from *somewhere*
<Googulator>producing a truly trustworthy Gentoo guaranteed free of Karger-Thomson backdoors, even if upstream is compromised (so long as the sources are clean)
<Googulator>(& also the hardware/firmware is clean)
<mid-kid>So I don't think a kernel bootstrap will work if you don't add extra instructions before switching to the x86_64 rootfs (you'll have to cross-compile a kernel)
<mid-kid>and catalyst can't be used to cross compile, so that one's out as an option
<mid-kid>Unless you build qemu-user-x86_64 I guess
<Googulator>I did that in https://gist.github.com/Googulator/86af97ed078eb9e6c18c6eb49e46c96d
<Googulator>(was also needed for Guix, since x86 Guix is currently broken upstream)
<mid-kid>I see!
<mid-kid>Would be cool to integrate a linux kernel build here
<mid-kid>anyway I'm not sure it's worth ensuring a no-binary-at-all chain to gentoo until we can hash the outputs and actually rebuild them reproducibly
<mid-kid>I can try using gentoo binpkgs but I'm not convinced those are a great idea eithe
<mid-kid>that said
<mid-kid>good luck and lmk how it goes, I'm interested
<Googulator>yeah, it's kind of a dilemma
<Googulator>Gentoo can build entirely binary-free, but isn't reproducible, and is not even meant to be
<mid-kid>gentoo's kinda the antithesis to reproducibility lol
<mid-kid>the way it's meant to be used no two machines will ever be the exact same
<Googulator>Guix is fully reproducible, but the 5 bootstrap binaries unfortunately provide a retroviral propagation path
<Googulator>& so far, I haven't been able to replace those binaries in a way that doesn't invalidate the entire hash tree
<Googulator>ideally, we would just reproduce those very same binaries (assuming they aren't backdoored), but the version of Guix that was used to build those predates the time machine system
<mid-kid>dang
<mid-kid>for guix it might be good to think about just invalidating the entire hash tree at some point
<Googulator>if it was ever a clean version of Guix, that is - I suspect it wasn't one, but rather some ad-hoc Git repository on a developer's box, possibly even with uncommitted changes
<Googulator>I know for sure that the binaries were compiled years before they were committed to Git
<mid-kid>if you can prove you can get the same binaries reproducibly and the entire path to creating them is documented, then you would only need to invalidate the entire hash tree *once*
<Googulator>& they weren't built at the same time
<mid-kid>and if you do it during a glibc upgrade or something I doubt people will notice
<sam_>[11:24] <mid-kid> the way it's meant to be used no two machines will ever be the exact same <-- this isn't quite right; there's a lot of people who use gentoo and deploy exact configurations with their _own-built_ binpkgs, and shared profiles and such
<Googulator>unfortunately it's still noticeable, because it means going through Scheme-only bootstrap again, which is         d       o       g               s       l       o       w
<sam_>but yes, in general, it's hard because the focus on our end is customisability
<aggi>i almost finished the _complete_ no-c++/tcc-toolchain/static gentoo build, maintained with crossdev
<Googulator>(the reason why I campaigned against Scheme-only bootstrap in live-bootstrap)
<mid-kid><sam_> but yes, in general, it's hard because the focus on our end is customisability <- yeah that's what I meant with "how it's meant to be used" lol
<sam_>my point is i don't agree with the how it's meant to be used
<mid-kid>ah
<sam_>because people do very much use it like that, and we help them out if they need changes or something
<sam_>it's just not #1 in people's minds either
<aggi>problem is, python/perl/autotools/portage are not self-hosting with tcc-toolchain
<mid-kid>I mean I can point to quite a few things that are "normal" to customize and sometimes influence the builds of packages in emerge
<Googulator>there's a massive speed difference between rebuilding glibc & beyond with Bash as the shell (which is what happens on a glibc upgrade), vs rebuilding with Gash - with Gash, configure script execution actually dominates the time to build
<mid-kid>and stuff like people modifying env.d and sticking files in /usr/local also influences builds to different degrees
<sam_>mid-kid: it's not about whether or not you CAN customise it (you can do that anywhere?), it's about whether we also make it possible to control things via the PM
<nimaje>wait, guix uses the hash of a package in dependencies too, so you have to do a build of each supported configuration when updating anything? or how should I understand 'invalidating the entire hash tree'?
<sam_>i can say that you can pass random env vars to dpkg and now it's non-reproducible, same thing
<Googulator>also, doesn't emerge timestamp packages, preventing reproducibility even with identical settings?
<sam_>yes, it does at the moment, someone's proposed a way to change that
<Googulator>nimaje: it uses the hash of the package's definition, but not the output
<Googulator>definition + hashes of the dependencies' definitions
<mid-kid>sam_: I mean I agree I just meant that it's really really easy to make binaries that are slightly different on every machine, unless you use binpkgs and a controlled environment of course.
<Googulator>nimaje: bootstrap binaries are an exception, because the actual hash of the binaries is used there
<mid-kid><sam_> yes, it does at the moment, someone's proposed a way to change that <- that's great! I was thinking of proposing/implementing something like that for this bootstrap thing, but good to know people are already interested
<Googulator>so if I replace bootstrap-guile without doing some magic, every package needs to be rebuilt
<Googulator>because everything indirectly depends on bootstrap-guile, which is a binary
<Googulator>My plan there is to produce some fully reproducible replacements, and then patch the hashing code to special-case on the hashes of the replacements, returning the originals' hashes instead
<nimaje>well, it should use the quasi-standart SOURCE_DATE_EPOCH https://reproducible-builds.org/docs/timestamps/
<mid-kid>I like how alpine's abuild sets SOURCE_DATE_EPOCH to the modification date of the APKBUILD file
<Googulator>the challenge is that the replacements must be identically buildable in guix (to avoid introducing an external dependency) and live-bootstrap with no guix installed (otherwise binaries built under control of the original bootstrap-guile will handle the replacement one, providing a propagation path)
<sam_>nimaje: yes, the question is how to handle interaction with our specification
<sam_>mid-kid: that's the plan for one of the toggles
<mid-kid>sick! do you have any bug tracker numbers or is this mailing list talk?
<matrix_bridge><Andrius Štikonas> mid-kid: yes it would be good to switch to pkgconf...
<matrix_bridge><Andrius Štikonas> We just need to check if anything breaks, perhaps some of the packages don't work with it
<matrix_bridge><Andrius Štikonas> E.g. something like autogen
<sam_>mid-kid: see bug 913920
<aggi>i think it would simplify bootstrapping (including what's considered M3-planet), if a complete development host was available with tcc-toolchain
<aggi>that could keep gentoo/package-management (python) and GNU-buildsystem/automake (perl)
<aggi>then, an entire gnu-toolchain can be spawned without almost no efforts, since gentoo maintained it for example, in the past
<aggi>it's not a big problem, to pull back gcc-4.7 ebuild with a few hacks to toolchain.eclass
<aggi>ironically, it is python and perl that are blocked against tcc, and with it take down autotools and gentoo
<aggi>just finished, full static-linking support (that isn't recommended nor maintained by gentoo, but it's needed for the remaining bootstrapping path that i got)
<aggi>i'll take a nap, and then once temperatures are down to normal during the night i'll cope with the perl and python issue
<mid-kid>godspeed
<mid-kid>o7
<Googulator>sam_: It just occurred to me, it would technically be enough to ensure reproducibility of Catalyst's output, for a bootstrap to be verifiable against the existing built package base
<Googulator>i.e. if Catalyst builds identical stage3 and install CD images on bootstrapped and standard Gentoo, then simply starting a fresh install from the reproducible CD & stage3 ensures there's nothing propagating
<Googulator>It seems there was considerable progress to achieving reproducibility at some point, albeit without Catalyst: https://wiki.gentoo.org/wiki/User:OstCollector/Reproducible_Build
<mid-kid>if you can achieve it with catalyst you can achieve it without catalyst - it doesn't really matter
<Googulator>If this can be ported to Catalyst, I assume it won't be hard to add a postprocessing step to "manually" regularize the remaining offending files before generating the final image
<mid-kid>most of what's listed in that wiki page should really just be added to portage itself
<mid-kid>or the ::gentoo tree in the cases where configure flags matter
<sam_>obviously, the point of the page however was to document their notes as they ewnt
<sam_>*went
<sam_>they're still working on it though
<sam_>or were not that long ago
<sam_>it's not meant to replace or prevent efforts to upstream or integrate stuff
<sam_>more just notes on an experiment
<Googulator>I suggest targeting Catalyst because it already tries to reduce configuration variance to a minimum (i.e. you build from standard definition files with Catalyst, as opposed to customizing for every system)
<sam_>(and I told them that)
<sam_>Googulator: ah, an interesting point
<Googulator>all of those configuration options that are "all over" a standard install, which introduce variability to the build - they are gathered in one location with Catalyst
<mid-kid>Googulator: oh, yeah it's a decent common denominator for a standard config
<mid-kid>but then again, imo so is an empty (except make.profile) /etc/portage dir
<Googulator>won't that default to --march=native?
<sam_>no
<sam_>users set that, we don't default to it at all
<sam_>unlike other distros we do not e.g. tell the compiler to do hat
<sam_>*that
<sam_>you can configure gcc s.t. it's the default but we don't
<Googulator>hmm, I thought that was the biggest source of variability, defaulting to optimizing for the local hardware
<sam_>no :)
<sam_>see, this is what I mean, people can opt-in to do this stuff, but a lot of it is their choice
<sam_>the default absolutely is not to do that
<sam_>stages ship with CFLAGS="-O2 -pipe"
<sam_>we just tell people they can do -march=native if they want
<sam_>(and many do, but it's not done by default)
<mid-kid>the biggest source of variability imo is the order in which stuff is compiled, which depends on which moments you choose to update your system
<sam_>so this reminds me and i wanted y'alls thoughts on this
<mid-kid>"emerge -e @world" ensures everything is compiled against the latest glibc, but regular upgrades don't
<mid-kid>and this source of variability exists in the default working of catalyst as well - it defaults to using cached binpkgs and only building updated packages
<sam_>I saw https://blog.josefsson.org/2024/07/10/towards-idempotent-rebuilds/ posted on HN and I found this fascinating
<sam_>because to me, this is so obvious
<sam_>I ASSUMED the reproducible build efforts in distros were talking about this kind of reproducibility
<sam_>but apparently they're not
<mid-kid>yeah no reproducible builds in most distros currently means "if you get this set of packages at *these* exact versions you can rebuild a package identically"
<sam_>yeah
<sam_>but what i also found interesting is
<sam_>> However as far as I know, they do not confirm or deny that their rebuilds match the official packages. In fact, typically their rebuilds do not match the official packages, even when they say the package is reproducible, which had me surprised at first
<sam_>???
<mid-kid>not sure what the author means tbh
<sam_>I think what they mean is like
<sam_>if you go on the debian reproducible build site, then look at a reproducible build it did
<sam_>it doesn't compare it to the official archive
<mid-kid>oh
<sam_>(the author is a Debian developer as well)
<mid-kid>oh yeah that sounds bad
<sam_>btw, wrt your glibc point: --rebuild-if-new-slot --rebuild-if-new-rev --rebuild-if-new-ver --rebuild-if-unbuilt in portage are kind of related to this
<sam_>(not entirely, but they are related to this idea of rebuilding-even-if-not-strictly-needed)
<mid-kid>the dependency tree in @system would need to become more complete/correct for that to work effectively I think
<sam_>yeah
<sam_>which brings me onto the other point
<notgull>Hello! I left a comment on the Miraheze wiki, but I figure I should ask here as well.
<sam_>i think the #1 thing to make better first in gentoo is to make it easier to bootstrap with external tools somehow
<sam_>like to say "ok, here's the fedora gcc or whatever, please try build a minimal root from it"
<sam_>which is obviously what your script/instructions do, just from a slightly different angle
<sam_>right now it involves a lot of hackery
<notgull>I think Rust and Zig aren’t actually bootstrapped. Rustc’s source code contains a lot of pre-generated code that I don’t think we work around, and zig-bootstrap is a binary blob.
<mid-kid>sam_: yeah that'd be really cool
<notgull>Are there plans to work around this?
<mid-kid>I've played around with scripts/bootstrap.sh while using the ROOT= variable to not a lot of success
<notgull>At the moment I'm working on a bootstrap plan for Rust to build the compiler without any pre-generated sources, so I’m mostly asking about Zig.
<mid-kid>honestly I'm not entirely sure what purpose scripts/bootstrap.sh even serves
<sam_>nor me :D
<mid-kid>it seems like a legacy thing from when stage1 still existed, and it's been (rightfully) pulled out of catalyst recently.
<sam_>even trying to be generous about what it should maybe do, it still doesnt make any sense
<sam_>like even if i accept its nothing to do wit hwhat i want
<mid-kid>yeah
<sam_>I think we should remove it to stop confusing people
<sam_>there's a place for real work there but that script is not it
<sam_>it is just misleading
<sam_>i will ask around
<mid-kid>it doesn't work correctly anyway - the "emerge -e @system" step it tells you to do at the end cannot resolve dependencies without fiddling with USE=
<mid-kid>sam_: the *one* use it had for me is teaching me about BOOTSTRAP_USE and USE=build
<mid-kid>anyway, tracing back a second
<mid-kid>I wonder if the script I made would work on other distros as-is already
<mid-kid>I reckon it could if you install enough build deps
<mid-kid>But it'd be nice to eventually turn it into a bootstrap-prefix.sh-esque script which just figures everything out as it goes.
<sam_>yeah
<sam_>I was just thinking about that
<sam_>(the relationship w/ bootstrap-prefix)
<mid-kid>yeah I haven't tried but I wouldn't be surprised if bootstrap-prefix would've just worked out of the box here
<notgull>Wait, the web client doesn’t keep you logged in? That’s not hood
<notgull>good*
<mid-kid>the only thing is bootstrap-prefix doesn't allow you to build into a new prefix-less ROOT
<mid-kid>(and I don't really think it's a good idea to clutter that script further with logic)
<mid-kid>(instead of moving some of that logic into portage/::gentoo)
<mid-kid>now I feel bad for talking over notgull
<mid-kid>I'm curious what those pregenerated files in rustc could be
<matrix_bridge><Andrius Štikonas> @irc_libera_notgull:stikonas.eu: I don't think anybody looked at pregenerated files in rust
<matrix_bridge><Andrius Štikonas> There used to be pregen binaries there (I think in third_party/vte) but that is now fixed
<mid-kid>they left
<lanodan>Doesn't looks cleaned up to me… https://hacktivis.me/tmp/rustc-1.80.1-src_deblob.log
<lanodan>llvm and winapi makes it quite noisy but there's also some blobs for linux (like librustix_outline_x86_64.a) and some wasm files.
<matrix_bridge><Andrius Štikonas> Hmm, true there are a few files...
<matrix_bridge><Andrius Štikonas> llvm files are for tests, so those.can be ignored...
<matrix_bridge><Andrius Štikonas> Though given xz backdoor, one should be careful with tests too
<lanodan>Yeah I think if they really need blobs for tests it ought to be something like optional tests in a separated tarball.
<lanodan>That said I still somewhat tolerate test blobs, in fact it's why deblob has a -e option to not remove them.