IRC channel logs

2021-05-08.log

back to list of logs

<stikonas>melg8: (cc fossy): maybe pragmatic thing for now would be to put timestamp erasing code into a separate script?
<stikonas>then you can run it after bootstrap is done
<stikonas>melg8: by the way, thanks again for looking at coreutils 8.32. Please don't be discouraged by high number of comments on the PR
<stikonas>all those GNU packages ship a lot of pre-gen files, so it takes a bit of experience to get familiarity with it
<melg8>It would be nice if we discuss place and form of my reproducibility script (because i would like to develop it more with addition of some rm-s of logs and other stuff to make reproducible builds which is not fixable inline) so it would be nice to have some decision now, so i can work on it not fearing it will not survive code review.
<melg8>I'm fine with review so far, and really glad that you and other people from community help and answer my questions, even though will not lie single touch command takes alot more than i thought it will take)
<stikonas>oh yeah, I understand you, live-bootstrap is harder than normal distro packages, I know if first hand...
<Hagfish>the fact it takes a lot of effort is proof of how valuable the work is :)
<Hagfish>if it were easy, someone would have done it already :)
<stikonas>as for reproducibiltity script, I can try to think a bit about it, but it's just my opinion, it would be nice to get other opinions too
<melg8>i want at least some "dump" of ad-hock fixes which are not affecting behaviour of build, but just ignore/alternate some parts of final image - so i can check final sha of whole thing and detect if something changed, when we not expected it to change + add it under CI testing
<Hagfish>there's certainly some merit to having all the "reproducibility-only" fixes in a script that's dedicated to that task
<stikonas>so I think most people would care that output of bootstrapping would be reproducible. But then we need to define what is output. It does not have to be the whole file system there. At the moment we have mix of 3 things there 1) installed files which we want to be reproducible, 2) tarred sources which are checksummed already and 3) build directories/transient files, etc, which I think we don't really need to keep fully reproducible
<Hagfish>normally i'd say that repro fixes should be separate patches so that they can be upstreamed, but it doesn't make sense to apply a patch to a deprecated version
<Hagfish>stikonas: good point, yes
<stikonas>but at the moment all those 3 categories are in one pot, not easy to separate themn
<stikonas>Hagfish: well, that particular script is removing timestamps
<Hagfish>and the timestamps are only visible to the filesystem, not the contents of the files, right?
<stikonas>well, yes, but they are accessible to programs
<stikonas>so something can read them
<Hagfish>maybe that "something" should be responsible for fixing them?
<stikonas>and it affects checksum of the whole tar of fs
<melg8>timestamps affect final hash - if you tar the whole thing
<stikonas>so that's why I think for now it might be easiest to have those few lines to clean up timestamps in a separate script
<stikonas>something like cleanup-timestamps.sh
<Hagfish>does modern tar let you specify a date to set the timestamps too?
<stikonas>which you can run if you want
<stikonas>we can still push it into the repo
<stikonas>then we would still be able to inspect timestamps for debugging purposes
<stikonas>e.g. when we were looking at gforce_de1977's issues, we did look a bit at timestmaps
<Hagfish>yeah, is this just for debugging? what's the first step which checks the hash of a generated tar?
<stikonas>yeah, timestamps are only useful for debugging
<stikonas>it's just some metadata about file
<vagrantc>yes, newer tar supports some feaures regarding timestamps: https://reproducible-builds.org/docs/archives/
<vagrantc>as well as a few other things
<Hagfish>if the timestamps only need to be right in the unhappy path, then yeah, it sounds like the code for setting them should be in a separate script which isn't usually run
<vagrantc>to get all the reproducibility features i'm aware of requires gnu tar 1.28+
<Hagfish>in the happy path, we can presumably use newer tar and do one final repro check at the end
<stikonas>well, even live-bootstrap has tar1.34
<Hagfish>oh, nice
<stikonas>but I think if you want to tar the whole thing up, you have to do it from outside
<stikonas>or is tar smart enough
<stikonas>not to include the archive that is being written
<stikonas>hmm, maybe you can just do something with find "everything" -xargs | tar cvf ...
<vagrantc>can probably exclude itself
<vagrantc>tar --exclude=somefile.tar
<Hagfish>yeah, that's smart
<Hagfish>in terms of the reproducibility of the live-bootstrap process, is there an "end" point at which it makes sense to calculate the "final" hash?
<vagrantc>melg8: really glad to hear you're looking at the reproducibility of the bootstrapping stuff :)
<stikonas>one way, we can have a bit more order in fs, is to try to move /after/package-* directories into /after/sources/package-*
<stikonas>then all temp stuff is basically inside subdirectory
<stikonas>which we can exclude
<Hagfish>yeah, nice
<stikonas>although, that requires a fair bit of work
<vagrantc>what are the outputs of "live-bootstrap" ? a gcc binary? several other tools?
<stikonas>vagrantc: at the moment it's just the whole filesystem
<stikonas>mix of binaries and installed files, temporary build directories and also original sources
<stikonas>it would be nice to have just installed files in outputs
<Hagfish>i'm worried that use cases like this, and the Stow work, will imply the need for some sort of custom overlay filesystem
<vagrantc>so then a tar file or other archive makes some sense to be the "final" thing to hash (obviously after purging temporary files
<Hagfish>you can tell tar to exclude based on a pattern, apparently
<stikonas>bauen1: will probably know more about Stow work
<stikonas>I don't realy know much about it yet
<Hagfish>hopefully it doesn't require extra layers of abstraction
<Hagfish>just being able to exclude stuff, and whitelist stuff, should be enough
<stikonas>maybe we can try to exclude after/*-* pattern from tar?
<Hagfish>what is the significance of the dash?
<stikonas>that mostly catch installed package files (binaries and some data files)
<stikonas>Hagfish: oh, that's just how we name packages
<stikonas>name-version
<Hagfish>oh, cool
<stikonas> https://github.com/fosslinux/live-bootstrap/tree/master/sysa
<stikonas>with a few exceptions
<stikonas>maybe mes, untar
<stikonas>that might need special handling
<stikonas>melg8: what do you think about this way?
<stikonas>i.e. taring stuff but excluding temporary build dirs
<stikonas>that might not even need touch
<stikonas>although, coreutils work is useful in any case
<melg8>I get idea of tar - that could be useful as part of final products outcome, but i interested more in full image products been checked for bit per bit reproducibility, and only where we all agree it's irrelevant timestamps stuff - it igntored on per file basis, but not like exclude_stuff* masks, why? because i want to setup some ci with self check
<melg8>at the end, so never do we ever encounter stuff like automake/autoconf producing 0.5 procent of time failures. Because this stuff can be observed only by logs (which start to differ) and can not be observed in highly stripped result
<stikonas>well, ok, I can see how your full checksum might sometimes be useful for debugging too
<vagrantc>melg8: if the tarball is bit-for-bit identical, all the individual bits inside are too, though ...
<stikonas>vagrantc: well, if not, that's when the problem is
<stikonas>if we don't tar build directories
<stikonas>we'll not be able to tell what went wrong
<stikonas>just checksum doesn't match
<stikonas>although, if something went wrong, then I think build aborts, so you need to invoke building image manually anyway
<vagrantc>but automake/autoconf aren't likely to produce reproducible output
<stikonas>melg8: said it's not too badf
<stikonas>just a few kb of differences
<stikonas>but I haven't tried
<vagrantc>well, if you want to keep that for debugging, sure
<stikonas>well, melg8 wants that :)
<vagrantc>alternately, you could build two tarballs, one with the toolchain, one with the transient build artifacts (if you need to debug it)
<melg8>it's more like - 1 defect is not detected, logs differs, final tar ok - nobody cares, second defect, third, final tar okay, than some day some refactoring triggers cascade of that baggage and we need to figure out that, oh wait some gnu tool at the start of the road is not properly working on some situations
<stikonas>but I think fossy wants to put those timestamp stripping commands into a separate script
<melg8>i fine with separate script too, but i think it should be at least inside of bootstrappable itself - so it can selfcheck on different runs (qemu, chroot, baremetal someday etc)
<stikonas>yeah, I'm fine keeping it inside live-bootstrpa
<stikonas>and I think fossy will agree, but we'll see
<stikonas>I'm not fossy :)
<stikonas>well, like fossy said, try to keep it optional
<stikonas>if it's a separate script, then it's automatically optional, if you want, you can start it
<stikonas>or we can even add some config file to enable automatically running optional stuff
<Hagfish>nice
<melg8>yea but one thing i'm not sure - can ci trigger it if qemu finished building part?
<stikonas>well, depends on how you set up your ci
<stikonas>e.g. you can have create some file "config" that is put inside initramfs by rootfs.py which has some config options. Then run2.sh reads config file and if it finds that optional config option, it runs another script
<stikonas>so "config" file then is per user bootstrapping customization
<stikonas>(some other option might be, start bootstrapping particular distro)
<stikonas>after core-bootstrap is done
<stikonas>we there are a few things where optional stuff might be useful
<stikonas>now regarding current CI, it will need rethinking anyway
<stikonas>because we started hitting 2h limit
<melg8>i for some reason think that this project is more like pathfinder/demonstrator/trusted check and other distros will maybe use it as reference for rewrite with their own infrastructure/methods/languages/packages, but will see.
<stikonas>maybe...
<stikonas>well, guix has their own implementation
<stikonas>although, guix is what started it and predates live-bootstrap
<Hagfish>live-bootstrap is quite nice in terms of creating outputs with known hashes, which could be used to create the input for a separate process (from the perspective of the CI)
<Hagfish>finding a nice place to split it down the middle, though, might not be easy
<stikonas>yes
<stikonas>and creating nice outputs is not easy either
<stikonas>but maybe will be better if Stowed packages
<Hagfish>yeah, i was wondering about the "artefacts"
<stikonas>initially, I thought everything is much simpler
<Hagfish>it would be kinda cool if it could produce, say, Debian packages early on, so that later CI jobs could just pull in those packages from Debian repos
<melg8>once live-bootstrap stable and produce results reliably - i would try to repeat steps but with guidance of nix system, and full control of what steps, what deps, and what hashes are there - using nix store + maybe testing. But live-bootstrap will bring me trust, that compiled nix executable is not compromised itself - if hashes are the same at the
<melg8>end of day
<stikonas>once you bootstrapped to tcc, (maybe gcc), things will be really simple
<Hagfish>it would be much simpler if each step only relied on one binary that was produced in the previous step
<stikonas>but once you start doing it, there are all kinds of things and issues you don't expect in advance
<Hagfish>right
<stikonas>well, in practice we need multiple binaries
<stikonas>but at the moment we can't even track which ones are used
<stikonas>because everything is in one pot
<Hagfish>wow, yes
<stikonas>that's why packages and eventually chrooted builds inside live-bootstrap might make process less messy
<Hagfish>yup, i can see the appeal of that now
<Hagfish>bootstrapping (or perhaps more accurately "the problem of not having a bootstrap") is starting to seem like a "hyperobject" (a term i recently learned)
<OriansJ>Hagfish: there is a very good reason why once we get GCC and Guile, I am strongly in favor of punting the rest of bootstrapping thw world work into guix and nix's hands. The porting and tweaking and simplifying work we have to do could take decades; without even looking at all the extra bootstrapping pieces the other distros would need to have done.
<OriansJ>Let someone else figure out how to make a guix/nix package to bootstrap Debian o
<OriansJ>or slack or gentoo or arch or etc
<OriansJ>The pieces we already have will need lots of love to work on PowerPC64le, RISC-V and other architectures
<OriansJ>Heck, think of the C features that need work or implementation in M2-Planet that would be really helpful (Like pointer arithmetic and array[i][j] behavior etc) supporting more foundational pieces.
<OriansJ>or imagine the possibilities of a binutils built at the M2-Planet stage, that MesCC could use. (Then janneke can just use standard binutils to port MesCC, rather than having to deal with the limitations of mescc-tools)
<Hagfish>yeah, there's a natural division of labour between pre-gcc and post-gcc work, but i'm sure that won't stop people from having a foot on each side of the line, if there's work on both they find interesting
<melg8>OriansJ hi, where can i find c equivalent to this stage https://github.com/oriansj/stage0/blob/master/stage2/cc_x86.s ?
<stikonas>melg8: https://github.com/oriansj/stage0/tree/master/stage2/High_level_prototypes/cc_x86
<OriansJ>melg8: https://github.com/oriansj/stage0/tree/master/stage2/High_level_prototypes/cc_x86
<OriansJ>in the same folder you just linked to
<melg8>am i right that https://github.com/oriansj/stage0/blob/master/stage2/cc_x86.s (in binary form) can build c version of itself? and will it produce itself from this c code? ( i mean will hashes be equal?)
<OriansJ>Hagfish: of course. everyone here gets to choose what fun they want to have. But I only suggest that having the later work in guix as a way of making their lives easier long term.
<OriansJ>melg8: yes
<OriansJ>The output from the C code and the assembly should be byte for byte identical
<melg8>btw i'm from nix community- but having hard time to find someone working on same idea as guix - with reduced bootstrap seeds
<OriansJ>melg8: well it never hurts to make a name by leading a change.
<melg8>yea) so i came here to learn first how it all works)
<OriansJ>also a minor note the cc_x86.s in that repo is written in knight assembly and the version written in x86 assembly is in stage0-posix (x86 folder)
<OriansJ>But the output between the two should also be identical
<OriansJ>(unless I forgot to backport something simple like a type definition )
<melg8>do you just really good at assembly or you compiled something like c variant and then produce assembly with additional comments for each byte?
<fossy>stikonas: melg8 I mean, a flag in rootfs.py enabling it
<fossy>Eg it could touch a file in the rootfs which causes the script to run
<melg8>so file based signal from host, that we want or doesn't want to run it inside of image at the end?
<melg8>like if sysa created file named make_results_reproducible - it will run cleanups script at the end of build?
<melg8>suggestions for name of file/script/option are welcome
<Hagfish>OriansJ: that reminds me of a saying about designing interfaces (e.g. graphical or APIs). it's something like: "make the recommended approach easy, while keeping the unusual approaches possible"
<Hagfish>the same idea probably applies to communities: offer people ways to contribute that you think will be the most rewarding, while still accepting contributions from someone motivated to do things differently
<fossy>yeah basically
<bauen1>currently my plan with stow / upkg-build would be to tar up the result reproducible and checksum that
<bauen1>the result of a build can then also contain some metadata describing the inputs (and their hash)
<bauen1>stikonas[m]: so i don't really bother with differentiating between the 3
<bauen1>stikonas[m]: and stow won't require (or use) any sort of overlayfs, but bind mount/chroot support from the kernel makes things easier
<bauen1>but i haven't started on the chroot build thingy yet
<bauen1>on the other hand the stow part and installing things into /upkg and symlinking instead of directly into / is almost entirely working no
<bauen1>*now
<bauen1>except i've hit a bit of a road block with my usernamespace setup and tar-1.34 as detailed above
<stikonas[m]>bauen1: --no-same-owner
<stikonas[m]>That should help
<stikonas[m]>And can be integrated into default_src_unpack
<bauen1>stikonas[m]: well, except earlier tar doesn't recognize it
<bauen1>and as such exits != 0
<stikonas>bauen1: well, yes, but we can branch
<stikonas>we probably don't have grep early enough, but there are other ways
<stikonas>something like
<stikonas>if test -e /after/libexec/rmt ; echo "this is new tar"; else echo "this is old tar"; fi
<stikonas>and if tar is new, we can even use simpler unpack syntax... no need for that --use-compress-program argument
<stikonas>tar -xv --no-same-owner "$archive" would do
<stikonas>tar -xf --no-same-owner "$archive"
<stikonas>I think that would solve your problem
<bauen1>yes, i haven't looked to closely yet
<stikonas>bauen1: I can try to quickly implement something like that
<stikonas>it's just a few lines anyway
<bauen1>stikonas: you can even try out my usernamespace setup if you want, it should be included on my stow wip branch `./rootfs.py --unshare --chroot --preserve`
<bauen1>that'd be nice, if not i'll take a longer later today
<bauen1>though i did not push recently
<bauen1>done
<stikonas>well, I made a change locally, I'll test it for now with old setup
<stikonas>which files were broken?
<stikonas>gmp was unpacked as 1006?
<bauen1>`tar: gmp-6.2.1/mpn/pa32/rshift.asm: Cannot change ownership to uid 1006, gid 1006: Invalid argument`
<bauen1>yes
<stikonas> https://github.com/stikonas/live-bootstrap/commit/864c6315a732d36ff2d3b0207fa6572af1e91351
<stikonas>testing it locally first, then will do PR
<bauen1>oh yeah i like that solution
<bauen1>except maybe that i'd rather do a check against tar --version, but whatever if it works it works :D
<stikonas>bauen1: tar --verasion is a bit harder
<stikonas>I don't have awk/grep yet
<stikonas>PR here https://github.com/fosslinux/live-bootstrap/pull/116
<bauen1>stikonas: thanks, i'll test it now
<stikonas>argh, it failed I need to update the order of arguments
<stikonas>bauen1: grab a new commit...
<OriansJ>Hagfish: that is perhaps the biggest problem. I don't know where one can make the biggest possible contribution relative to effort. For example let say there is a c/c++ programmer like melg8 who would want to contribute to a non-posix dependent part. I could suggest helping to improve M2libc or M2-Planet or helping convert mes-m2 to use M2libc (thus solving the porting of MesCC to more architectures instantly) but honestly what seems
<OriansJ>most interesting and fun would probably be the better recommendation.
<OriansJ>fossy: doesn't untar.c do the correct thing in regards to userids and datetime stamps?
<gforce_de1977>yeah! i managed it to store QEMU-snapshots. the idea is to have ready images where 'bootstrapping is completed' and we have a shell, without having to wait 45 minutes. images/ram-snapshots are ~200 megabytes (compressed), but load instantly 8-)
<bauen1>stikonas: thanks, it's working
<fossy><OriansJ> fossy: doesn't untar.c do the correct thing in regards to userids and datetime stamps?
<fossy>whats the "right thing"?
<stikonas>untar.c is quite similar to old gnu tar 1.12 (I think from 1994)
<stikonas>I think both unpack as current user (so root)
<stikonas>and timestamps are the same as in archive
<stikonas>it's the newer tar 1.34 that has more options and tries to be smarter
<stikonas>but was interfering with user namespaces
<fossy>re the checksumming of the filesystem. I agree with stikonas, while we can theoretically strip it down, removing build dirs, changing timestamps etc etc, im not sure I see the utility in that right now when we already have a method to ensure reproducibility for generated files and binaries
<fossy>when live bootstrap is complete and more comparitalized I think we should look at it though
<stikonas>well, once we have upks, I think we'll have something better
<fossy>yeah
<stikonas>we'll be able to checksum all files
<stikonas>not just the ones we do manually
<stikonas>but I guess we can still include a file to strip timestamps...
<fossy>Also I do care about intermediate packages that ate not included in the final fs
<stikonas>so that one can run it optionally
<fossy>yeah sure
<fossy>just not by default
<stikonas>yeah, but I think I care more about build "outputs" rather than "intermediate build dirs"
<fossy>same, im not too worried abt build dirs but I am about binaries/outputs not included with the final rootfs
<fossy>that are superseded or removed