IRC channel logs

2021-12-26.log

back to list of logs

<fossy>doras: your understanding of skipping the python stuff is correct, and I have intentionally designed it as such so that can be done reasonably easily
<fossy>really, rootfs.py shouldn't be used in production, its mostly just a development wrapper, although some complexity particularly with moving of sources etc can be improved
<fossy>doras: can you explain why you need to skip most of live-bootstrap? I dont exactly follow
<doras>fossy: consider BuildStream to be a tool which allows one to automate different types of actions in the context of a sandbox environment. Each BuildStream element basically prepares the sandbox environment (input), performs an action or set of action inside the environment (build, copy files, run commands, filter out file types, etc.), and collect the result (output). Chaining these elements using build-time and runtime dependencies
<doras>allows one to easily automate and maintain very complex sets of operations, and run them in parallel where possible. Combined with the fact that each element type (called a plugin) is fully programmable, it's a versatile tool.
<doras>However, BuildStream doesn't have a "built-in" sysroot for its sandbox environment. An element must always provide one as a build-dependency.
<doras>This is important because one of BuildStream's main purposes is reproducibility. You want a specific element to run at the exact same conditions regardless of the host that runs it.
<doras>So this is simple when you build an element which depends on a set of existing elements, since they must also depend eventually on something resembling a basic sysroot.
<doras>The "eventually" part is the tricky part. You have to start with something.
<doras>So in our project this "something" is a previous image of a small subset of our runtime (called PreBootstrap). Each release depends on the PreBootstrap image of the previous release.
<doras>The first release depended on a Yocto sysroot if I'm not mistaken.
<doras>The PreBootstrap image is only used as a base system which bootstraps a new bootstrap image, which is what we use to actually build the project.
<doras>So obviously all of this is not very bootstrappable, as we rely on a chain of bootstraps that we can't really follow or reproduce entirely.
<doras>So to make this bootstrappable, we starting thinking about trying to use your project as a base of our PreBootstrap image, instead of our own images.
<doras>we started*
<doras>To achieve this we need to create an element whose input is bootstrap seeds and sources alone, and whose output is something that brings us closer to a sysroot.
<doras>However, if we use live-bootstrap "as documented" by running rootfs.py, we'll need to be to run Python, which suggests that the input to our sandbox is not only bootstrap seeds and sources, but also binaries that can run Python scripts. Obviously this would mean that we are no longer be bootstrappable, so shouldn't do it.
<doras>The best thing live-bootstrap could do to help us achieve this is to provide a git repo that includes everything necessary for a complete the sysa+sysc bootstrap (with git submodules, if needed) already located in all the right places, with a single requirement to fully bootstrap: execution of a seed.
<doras>So it would only require: git clone, git submodule update, chroot to the repo root + execute seed.
<doras>You decided to split the chroot-based bootstrap process into two phases: sysa and sysc; this is entirely fine and also makes sense. However, the fact that each of the two requires a different input to its sandbox (chroot) means that we need two separate elements for it.
<doras>So this means that we can't have sysa's run.sh do a chroot to sysc's since we have to do it ourselves. So ideally, we would have each of the two live in a separate git repo (or a separate directory within the same repo), and be fully bootstappable in source-form without any manipulation necessary: git clone, git submodule update, chroot to the sysa/sysc directory + execute seed.
<doras>Of course, the sandbox sysroot of sysc would be the bootstrap products of sysa.
<doras>So I guess it's not really skip most of live-bootstrap, but more of... miss the point of live-bootstrap trying to make lives easier to bootstrap.
<stikonas[m]>oriansj: I'll take a look at #include functionality tomorrow
<doras>Because its current structure and the fact that many manual preparations are needed is kind of user-friendly, but not automation-friendly.
<stikonas>doras: and yes, everything is statically linked but the reason for that
<doras>It would have been nice if it had an automation-friendly "core" repo as I described above which is self-contained and has 0 external dependencies as one would hope from a boostrappable system, and then maybe a user-friendly "helper scripts" repo that wraps it nicely for those who wish to do things manually.
<stikonas>is that I couldn't get dynamic linker to work
<stikonas>musl in principle fully supports dynamic linker, but for some reason I was getting a crash
<stikonas>doras: and yes, we have more things than necessary in the final image
<stikonas>we basically ship all old versions
<stikonas>including gcc 4.0.4 even though we have gcc 4.7.4
<stikonas>the reason is that right now it's hard to remove things, there is no easy way to say, remove all gcc 4.0.4 files
<stikonas>that can be implemented but it needs more work
<stikonas>so it's not done yet
<stikonas>ideally, we could create some kind of packages from inside the bootstrap
<stikonas>well, you need to build to a certain point before that becomes feasible
<stikonas>at the very least we need coreutils/bash and make
<stikonas>doras: why it it strange that /bin does not point to /usr/bin?
<stikonas>symlinking /bin to /usr/bin is optional, some distros do that, some don't
<stikonas>but you need at least some stuff in /bin, hence those symlink
<stikonas>e.g. without /bin/sh symlink you can't just run scripts with #!/bin/sh shebang
<doras>stikonas: because every binary seems to be in /usr/bin, yet there are a few symlinks inside /bin that point at executables in /usr/bin.
<stikonas>well, without those symlinks inside /bin some things would break
<stikonas>doras: also regarding external tools for initramfs
<doras>But why not create one big symlink from /bin to usr/bin instead of individual ones inside /usr/bin?
<stikonas>doras: because we can't create symlinks early in the bootstrap
<stikonas> /bin is created earlier
<stikonas>but we can do it your way
<stikonas>once ls is available
<stikonas>delete /bin and run ln -s /usr/bin /bin
<stikonas>so regarding that initramfs image, it is uncompressed
<stikonas>so should be possible to create even without cpio
<stikonas>anyway, you still need kernel for baremetal mode
<doras>stikonas: these were a few of my observations, as a form of constructive feedback to the project :)
<stikonas>so that's the bigger problem...
<stikonas>well, stage0-posix is kind of easier start but better for automation
<stikonas>more pure approach would be to start bootstrap on baremetal
<stikonas>even without kernel
<stikonas>but then you would have more manual steps
<stikonas>and more hw specific manual work
<stikonas>so stage0-posix is kind of compromise to get something working
<stikonas>anyway, tar and cpio files are fairly human readable
<stikonas>oh, but your don't need cpio for chroot mode
<stikonas>well, there is still rootfs.py that downloads sources
<doras>stikonas: we want to create a system whereby any user with the necessary seeds and sources and access to a system running BuildStream, even offline in a cave, or using a cloud-hosted container that they can't reboot, would be able to reproduce freedesktop-sdk fully, and as a result also gnome-build-meta and GNOME OS, the KDE runtimes, and potentially other interesting use cases.
<doras>All they need is to run a single BuildStream build after cloning the repo.
<doras>BuildStream would automatically bootstrap everything locally for that user from seeds all the way to GTK4 and whatever else.
<stikonas>well, somehow we need to supply to BuildStreem that directory layout that rootfs.py prepared
<stikonas>well, one of the issues is that stage0-posix has to be starting rootfs point in the bootstrapping system
<stikonas>since there is no "cd" command, we can only run a single "init" script
<doras>I personally and naively believe that rootfs.py doesn't need to prepare things. Things need to come prepared, or at least with a simple recipe for their preparation.
<stikonas>well, it is fairly simple recipy
<stikonas>it mostly moves things around a bit
<stikonas>maybe some of that can be done in the repo
<stikonas>fossy: what do you think?
<stikonas>in that case we would have to move things around in live-bootstrap repo to closely match where things are in bootstrapped system
<stikonas>in principle it might work
<stikonas>with some scripting changes
<stikonas>e.g. /after becomes /sysa
<stikonas>hmm...
<stikonas>although the problem then is that root of the repo is still stage0-posix...
<stikonas>so live-bootstrap repo would be a bit of a mess :(
<stikonas>I guess at least a bit of preparation is unavoidable
<stikonas>there are some configuration parameters, e.g. which arch we are bootstrapping
<oriansj>stikonas: well stage0-posix by default puts its binaries in $ARCH/bin and once it is done; we can move everything to anywhere you want.
<fossy>stikonas: doras I think that my packaging plans are pretty reasonable wrt to fixing most problems. if we have a set of packages that can be extracted from the end of sysa sysa and sysc can be operated completely separately
<fossy>sysc does have /usr/bin symlinked to /bin
<fossy>Iirc
<fossy>doras: thank you for your observations so far, this is very helpful in determining how we can improve live bootstrap for end users :)
<fossy><doras> I personally and naively believe that rootfs.py doesn't need to prepare things. Things need to come prepared, or at least with a simple recipe for their preparation.
<stikonas>well, stat's how stage0-posix works, there is no preparation there
<stikonas>you just kick off init process
<fossy>FWIW, I completely agree with this
<fossy>its just that currently, there is shortcuts we have taken in the meantime to make this easier to develop
<stikonas>indeed
<stikonas>and rootfs.py should be simple enough to be doable manually without python
<fossy>We could (rather trivially) write in words what needs to be prepared now, however, including splitting out sources information form sys*.py files
<doras>fossy: it would be a very good start indeed.
<doras>Right now I'm reverse engineering it :)
<doras>Quite slowly
<stikonas>why reverse engineering, it's in python...
<stikonas>but yes, we can try to start simplifying some things
<stikonas>hmm, probably one things that can be simplified and removed from rootfs.py
<stikonas>is to just put /sources directory in rootfs
<stikonas>and have scripts inside bootstrap do the copying
<stikonas>that would remove the need for this line https://github.com/fosslinux/live-bootstrap/blob/master/lib/sysgeneral.py#L138
<doras>stikonas: for example, I think we create a copy of "kaem-optional-seed" called "init" inside sysa/tmp, and then we actually execute its old name in the chroot case.
<stikonas>oh init is for the qemu mdoe
<stikonas>Linux kernel runs /init file when it boots
<stikonas>for chroot mode that copy doens't matter
<stikonas>hence we execute kaem-optional-seed
<stikonas>we could just as well run init, it would result in the same thing
<doras>This is just an example for things that aren't trivial to figure out when looking at the source.
<stikonas>oh, I think we can already remove mkduild parameter from rootfs.py
<stikonas>let me do that
<stikonas>it's a tiny simplification, but still
<stikonas>that option is from times before stage0-posix had simple mkdir implementation
<stikonas>so we had to prepare some empty directories in advance
<stikonas>that's no longer necessary
<stikonas>pushed: https://github.com/fosslinux/live-bootstrap/commit/6e3fab4da2e9c4b7ccc0625b83d70f7ccb19f6f4
<fossy><stikonas> although the problem then is that root of the repo is still stage0-posix...
<fossy>yes, this is the main problem
<fossy>and something I am not prepared to really change
<stikonas>yes, that's indeed the complicated thing
<stikonas>we can't easily change it without making live-bootstrap repo really messy
<stikonas>but maybe at least simplify things enough, so that rootfs.py is a bit simpler
<stikonas>although even now it's not that complicated
<stikonas>but there are a few things that can be removed
<stikonas>but at least half of rootfs.py code is list of sources and their checksums...
<fossy>which can be trivally represented by eg. json
<stikonas>yeah, but then you need json reader...
<stikonas>it's nothing that can't be manually done
<stikonas>but automation will need some kind of preparation
<stikonas>unless we really move to stage0-posix in root (/)
<stikonas>which is sigh...
<oriansj>well we could add CD support to kaem-optional-seed with some effort
<stikonas>not sure if that would help
<stikonas>well, maybe
<stikonas>but it will blow up the size of the seed
<stikonas>which is really not ideal
<oriansj>by 70-120bytes more (approximate guess)
<stikonas>kaem-optional-seed is already one of the hardest bits to write
<oriansj>yeah and we want to convert to kaem-micro-seed when possible
<stikonas>(the other being hex1)
<oriansj>as it'll shrink things once everything is fully stablized.
<stikonas>well, it will shrink the seed
<stikonas>although kaem-mini will still be written in hex0
<stikonas>and that is still ideally minimized
<stikonas>once you have .hex1, things become much better
<stikonas>anyway, going to bed now, you all can continue discussing
<oriansj>also given that stage0-posix is only a handful of folders (all self-contained) it is trivial to just move everything out of the way as part of after.kaem
<fossy>I really really really don't want stage0-posix at root of live-bootstrap
<fossy>It would be OK to be able to copy paste everything in live-bootstrap to the root of stage0-posix and then go from there IMO
<oriansj>fossy: ok, what change can I make in stage0-posix that would help you in that regard?
<fossy>mm, not certain just yet
<fossy>I dont think anything yet
<fossy>but I will let you know if I find something useful later
<oriansj>well here is an unfinished idea: break up stage0-posix into separate git repos for each Architecture
<oriansj>so that live-bootstrap can just do git submodule for the architectures it wants to support and then throw an kaem.$arch file into its root
<oriansj>it would need to include bootstrap-seeds, M2libc, M2-Planet, mescc-tools and mescc-tools-extra as git submodules as well.
<fossy>yes, but we still have the problem of not supporting cd early, so we require a great amount of random files in live-bootstrap root
<oriansj>it has the added benefit of allowing me to give people specific access to a single Arch that they work on
<oriansj>fossy: well not exactly
<oriansj>only init and kaem.$arch; everything else will be in folders
<fossy>hmm
<oriansj>the answers.$arch file isn't actually required
<oriansj>nor the makefile
<oriansj>after.kaem can be put anywhere
<fossy>ok, yes, this is a reasonable path forward, I think
<oriansj>as the kaem at that stage does support CD
<fossy>particularly if we cab make the folder named something like, zstage0_x86 of something, just so it appears at the end of the list in a directory listing
<oriansj>fossy: git submodules allows arbitrary naming
<oriansj>its why stage0-posix is the submodule with a folder named Linux in stage0
<oriansj>just set path = blah in .gitmodules
<oriansj>I was going to do stage0-posix-$arch but keep the folder names in stage0-posix
<fossy>yeah, exactly, I can't see any reason why this wouldn't be possible
<oriansj>just requires a block of time for me to do it
<fossy>well, no extreme rush, I would like to finish packaging in live-bootstrap first
<oriansj>of course
<oriansj>plus I am long overdue for figuring out the bus factor plan for my pieces.
<oriansj>hmmm
<oriansj>stikonas: I managed to reduce it down to a trivial test: https://paste.debian.net/1224800/
<stikonas>oh, ok, let's see. I've just started looking at it but with the test it would be clearer what to look for
<oriansj>and I found a few other preprocessor bugs along the way
<stikonas>well, that's not surprising. preprocessor is still a very new thing in M2 world
<oriansj>indeed
<oriansj>and I'm thinking preserving comments in -E output is probably a good idea so I've altered comment behavior to save all comments as a single token for easier clean up later if we change our mind
<stikonas>that's fine
<stikonas>oriansj: I can see in gdb what goes wrong
<stikonas>oriansj: so this define consists of more than one token (which we should support)
<stikonas>we have "-" and then "1"
<stikonas>but because we are in the #else block this kicks in after the first token https://github.com/oriansj/M2-Mesoplanet/blob/b34a2528c8efb95e0f0970d208fbb0d3ffb4c4ec/cc_macro.c#L588
<stikonas>and sets hold = NULL
<stikonas>and then on the next iteratio nof while loop hold is NULL and things go wrong
<stikonas>maybe just if(NULL == hold) continue; ?
<stikonas>well, continue after moving to the next token
<stikonas>oriansj: something like https://paste.debian.net/1224801/
<stikonas>or am I missing something else here
<oriansj>stikonas: might work
<oriansj>I'll give it a try
<oriansj>see if any knock on problems
<oriansj>doesn't look like it causes problems.
<oriansj>so now M2-Mesoplanet -E -f foo.c -o bar.c should work correctly
<stikonas>ok, that's good
<oriansj>so why is mkstemp now throwing a segfaults for me now
<oriansj>stikonas: can you pull the latest and see if M2-Mesoplanet -f foo.c -o bar works for you
<oriansj>(to rule out local problems on my side)
<stikonas>pulling
<stikonas>seems to work on that EOT testcase you + int main() {return 0;}
<stikonas>s/EOT/EOF/
<stikonas>oh, but I have built M2-Mesoplanet with gcc
<stikonas>so maybe that's the issue
<stikonas>or are you also building it with gcc
<oriansj>no; my issue is with GCC built M2-Mesoplanet
<stikonas>so at least simple files work
<oriansj>yeah but complex input files shouldn't change the mkstemp behavior
<stikonas>yeah...
<oriansj>but since your test worked, we know that M2-Mesoplanet works
<oriansj>and #include is actually working
<oriansj>brb
<oriansj>yeah, it is a bug with the glibc I built.
<stikonas>ok
<stikonas>hopefully it works with M2libc...
<gbrlwck>i'm confused about MEScc allocating local variable space on the stack, but as far as i can tell does not deallocate that space. looking at the M1 output of lib/tests/scaffold/04-call-0.c which results in https://termbin.com/ri5p ): wouldn't two calls to testi result in twice the stack space allocated instead of a alloc-dealloc-realloc pattern?
<nimaje>wasn't the point of a call stack that when you return the current stackframe gets discarded?
<gbrlwck>nimaje: i have no idea! but how could a CPU know where to (re-)increase the SP to after calling a function?
<doras>Would it be possible for after.kaem to be made aware of the architecture in which it is running?
<doras>Currently this is external knowledge which requires access to cp or mv from the user of live-bootstrap.
<doras>Actually, I think we have get_machine at this stage.
<doras>Hmmm...
<gbrlwck>doras: isn't it in the ARCH env var?
<nimaje>how does ret know where to jump back to? (modern?) cpus have a register pointing at the current stackframe and that frame should begin shortly after the SP at the time the call was
<doras>gbrlwck: currently live-bootstrap injects this information to the bootstrap directory as part of its preparation stage.
<doras>It basically renames after.kaem.x86 or after.kaem.amd64 to after.kaem
<gbrlwck>doras: currently live-bootstrap is only ready for x86; i've added some changes to get it to build up to mes-m2 for riscv64. only thing i needed to do was to set a ARCH=riscv64 in sysa/after.kaem.riscv64
<stikonas>doras: get_machine does print information needed but you can't consume it in kaem scripts
<gbrlwck> https://github.com/gbrlwck/live-bootstrap/commit/875f5d1ac42452023fa596b3396dd283af67f0cf
<stikonas>you can't really pass that information into kaem without adding additional syscalls (dup)
<stikonas>hence for now arch is injected
<stikonas>you can probably write a binary like get_machine that sets those for you
<stikonas>and launches the next script
<gbrlwck>nimaje: so you're saying in x86 this is done automatically? i thought for ret to work we only needed the previous stack pointer (+ offset to the next instruction)?. in rv64 we sacrifice one of the "general purpose" registers for the RA (return address).
***nckxmas is now known as nckx
<stikonas>doras: still, how would even get_machine help?
<stikonas>get_machine would always print amd64 on my system
<stikonas>even if I'm running x86 bootstrap
<doras>I see.
<stikonas>in some sense like you said arch is an external parameter
<stikonas>although right now I think we input that external parameter twice
<stikonas>once, when we pick kaem-optional-seed that we run, and second time is that after.kaem.arch file
<stikonas>in principle, second one should be deducible from the first
<stikonas>but I can't see how to do that with get_machine
<stikonas>in C, maybe...
<stikonas>e.g. that new tool M2-Mesoplanet is aware of its own arch
<stikonas>so in principle other binaries, e.g. kaem could be made aware of that too
<doras>stikonas: that second time is an issue for me. Just complicates things.
<stikonas>so I think the way to go then
<stikonas>is to implement $ARCH built-in env variable in kaem
<stikonas>the other STAGE0_ARCH variable can then be worked out automatically using some scripting...
<stikonas>using a few if statements in .kaem files
<stikonas>doras: and the first time is fine for you?
<stikonas>(choice of kaem-optional-seed)
<stikonas>we can't really get rid of that one
<stikonas>since that one is a real choice of which chain to run
<doras>I'm still not sure because I don't have the full picture, but I think it's fine since it's the only executable we're actually executing directly as part of sysa, it's easy to give different paths for different architectures.
<doras>Moving files is a different story.
<gbrlwck>doras: what exactly are you doing/trying to do (if i may ask)?
<doras>gbrlwck: I basically want to build live-boostrap, including preparations, with nothing but a kernel providing chroot and the execution of the seed.
<gbrlwck>on which architecture?
<doras>Currently live-bootstrap can build all the way to sysc only on x86 as far as I know.
<doras>But the kernel system doing the bootstrap would be running on x86_64.
<doras>We would want to bootstrap amd64, aarch64, riscv and ppc64le once they are supported, but for now we'll have to cross-compile them.
<gbrlwck>i thought `./rootfs.py --chroot` was to run live-bootstrap on a host-arch (without providing a separate kernel) and the default used qemu (where providing a kernel is necessary)
<doras>Well, I'm running live-boostrap on a host, not booting to initramfs or similar.
<doras>But I want to prepare live-bootstrap for building without depending on anything from the host.
<doras>stikonas: would having a different stage0-posix git repo for each architecture also remove the need for the $ARCH and $STAGE0_ARCH variables?
<doras>Assuming we can stage the correct repo in sysa/tmp, of course, and that each repo would be largely identical in its file names and structure.
<stikonas>doras: it's probably not necessary
<stikonas>it looks a bit too complicated
<stikonas>to split arch repos but we'll see
<doras>stikonas: would it though?
<stikonas>hmm, maybe not, they are quite independent...
<stikonas>but still, we only need to define $ARCH variable in further steps
<stikonas>anyway, it's up to oriansj whether to split stage0-posix...
<doras>stikonas: could kaem read those from a config file if we had one?
<stikonas>not out of box right now but shouldn't be too hard to add
<stikonas>basically you need to add support to . / source command
<stikonas>right now kaem only supports supports env variables, cd, pwd, set, unsed, exec and echo
<stikonas>and this is full kaem that is written in C
<stikonas>kaem-optional-seed is a much smaller program written in hex0 and only supports running commands with arguments are removing comments
<stikonas>in principle it would be nicer if everything can just run with the config file
<stikonas>which can then be either handwritten or created using python helper
<doras>But at the after.kaem stage we already have the full kaem, don't we?
<stikonas>doras: yes
<stikonas>full kaem is maybe step 9 out of 18 or so in stage0-posix
<doras>stikonas: it may also require "export", no?
<stikonas>doras: just VAR=value should work
<doras>So kaem currently exports every variable?
<stikonas>probably, but I'm not 100% sure
<stikonas>in any case we only needs those variables in kaem script itself
<stikonas>config file that bash later consumes also does not export
<stikonas>since config variables only affect branches in the script itself
<stikonas>so we define some variables like GUILE_LOAD_PATH in mes build script and it works
<stikonas>so I think it's basically automatic export
<stikonas>at least until we reach next scripting engine (bash)
<doras>I guess so.
<fossy>doras, stikonas: yes, kaem currently exports every variable