IRC channel logs
2023-12-18.log
back to list of logs
<fossy>thank you Googulator! i'll take a look at thse changes <fossy>Googulator: regarding creating external.img only when needed, what is your vision here? (not to transition to a disk?) <fossy>Ah, alright, that makes sense <fossy>yes, that should work quite fine <stikonas>fossy: while you are there, any thoughts how we can try to integrate (future) UEFI bootstrap into live-bootstrap? <Googulator>Especially important on bare metal, where it means not only an extra physical disk (with extra firmware and corresponding extra risks involved), but also a motherboard that's able to handle 2 disks properly in both BIOS and Linux *and* present them to Linux in the anticipated order <stikonas>Googulator and I were thinking that UEFI bootstrap will start in UEFI, then after M2-Planet it could build something like builder-hex0 (but written in C) that would then try to execute POSIX binaries <fossy>yeah, i did read the thoughts abut that <fossy>i think that is the best option <Googulator>fossy: Are there any patches to sfdisk itself in the final merged version? <stikonas>anyway, for now I'm just trying to fix-up stage0-uefi to work on my baremetal machine... <Googulator>Because if this is the same version of sfdisk that we used pre-simplify, it's gonna crash and burn on bare metal <fossy>i think the most ideal scenario is that UEFI stage is solely in stage0-uefi and then live-bootstrap wouldn't need toooo much special logic for uefi version <fossy>which should be possible with that idea <stikonas>yeah, needing just 1 disk is quite important feature <fossy>I don't think there's any technical limitations to 1 disk. just not something i implemented in simplify PR for the sake of the PRs simplicity <fossy>Googulator: i haven't patched sfdisk, no <fossy>nope.. it hasn't got any of your bare metal changes <fossy>i wasn't expecting it to work on bare metal yet either <fossy>that PR had very very minimal functionality changes <Googulator>I'm trying to make the individual PRs independent as much as possible, but all will get merged here so I have something to actually test <fossy>that's a good workflow, i did something very similar for simplify branch <fossy>is the 1GB unpartitioned fairly arbitary? it might make more sense to calculate that on the fly <stikonas>even though those smaller PRs had mostly the same content <Googulator>The 1GB is fairly arbitrary, but I do want to keep it a power of 2 <fossy>hmm, isn't 4K alignment generally sufficient <fossy>or do SSDs particularly like power of 2 alignment <Googulator>Real SSDs with a proper controller usually don't care much about alignment, as the controller will take care of it <fossy>i was about to say -- i'm a bit surprised SSD controllers don't do this <Googulator>But if you're using something like a USB drive or SD card, then you want to stay on an erase block boundary <fossy>but i guess there are bad/nonexistant controllers out there <stikonas>generally recomentation is to keep partitions 1 MiB aligned <stikonas>which I think is still valid for SSDs... <Googulator>Erase blocks can be quite big - I've personally seen 32MiB erase blocks on an SD card <Googulator>Also, 1GiB leaves space after the srcfs to create a boot partition at the end, without overwriting srcfs <Googulator>assuming a sane BIOS that doesn't require the boot partition to reside in the first 504MB or worse <fossy>hm, as in, after the 1GiB or before the 1GiB but after teh srcfs? <fossy>a boot partition which does what? <Googulator>It can be done without one, but better to have a dedicated /boot <fossy>i think this had something to do with your trusting trust drive - what was the benefit of keeping srcfs around? <Googulator>It's not about that, in fact, for the trusted Flash drive, I explicitly want to make the srcfs inaccessible once Linux starts <Googulator>Also, I was thinking of getting the previously erased sources in Linux again by reading srcfs <Googulator>It's an alternative path, vs. including distfiles in the initramfs <stikonas>probably doesn't have to be part of automation <Googulator>What's important is to have the real on-disk file system start in a sector divisible by 8 (to avoid issues with AF drives) and to have some space to install Grub's early stages into <Googulator>& of course, all of that needs to be done while keeping sfdisk satisfied about CHS correctness <fossy>yeah that's fine, it's just the 1GiB padding i'm not so sure about, i think the padding will cause more confusion for end users than the debugging benefits it provides (which can always be done elseways) <Googulator>also, if you make /boot FAT32, it can also be used as EFIESP <fossy>OHH wait i was misinterpreting this a bit <Googulator>so you bootstrap in legacy/CSM mode (for now, until stage0-uefi is ready), then install grub, and reboot using UEFI <fossy>the partition table itself still fills the whole disk... <Googulator>Perfect for /boot or preserving srcfs, if you care about those <fossy>ok in that case my actual preference would be making a partition of type 0 covering the srcfs region followed by empty space followed by the partition <fossy>that way it's really clear to end users once the bootstrap completes what's going on <Googulator>I would love to do that - if we had a sane tool to do so <fossy>i might try building parted but not sure how that will go <fossy>will get back to you on that <Googulator>already, I'm fighting fdisk's insistence on IBM PC-XT compatibility to just get things to work at all on bare metal <Googulator>otherwise nasty surprises can result when you bootstrap twice from the same physical HDD <stikonas>parted tries to be a bit higher level tool... <Googulator>without -F -F, mkfs checks for an existing superblock and errors out if it finds one - it wants to play it safe and not overwrite an important FS <Googulator>but of course, our error handling at that point is "ABORTING HARD" <stikonas>where as sfdisk really just deals with partitioning without caring for file systems <Googulator>& several hours of work irrecoverably lost at that point <Googulator>> find / -xdev -type c -or -type b -not -name "ram*" -printf "nod %p %m %U %G %y " -exec stat -c '%Hr %Lr' {} \; >> /initramfs.list <Googulator>btw, another quick tip for enterprising bare-metallurgists: don't bootstrap with a GeForce GT 730 <fossy>yeah i was slightly surprised that worked but was glad to see it did <Googulator>linux-4.9.10 doesn't like it for some reason, consistently locks up with a white-on-green screen <Googulator>probably incomplete implementation for that generation of cards <Googulator>or maybe some incompatibility between the much older chipset and it <Googulator>NV3x (GeForce FX/PCX generation) works perfectly though, and luckily I had one of those at hand <Googulator>Intel integrated also works-ish, but it won't actually get a high res console <Googulator>probably something wrong with my kernel configuration (maybe uvesafb taking priority, and then not finding the needed userspace tools) <Googulator>but it doesn't sound like a sane idea to copy /dev from Fiwix to Linux <fossy>technically no, we could just rerun populate_device_nodes <fossy>are we acutlaly using devtmpfs <Googulator>it's mounted by default in initramfs in recent-ish Linuxes <Googulator>although it could be disabled in the current Linux kernel config <Googulator>it's definitely enabled in mine, I used it when I was bringing up bare-metal on the WIP PR <fossy>"If CONFIG_DEVTMPFS_MOUNT is set to y when building the kernel, the resulting kernel will automatically attempt to mount devtmpfs to /dev after mounting a root filesystem - unless the kernel is using an initramfs for the initial root filesystem" <fossy>and we have CONFIG_DEVTMPFS_MOUNT=y <Googulator>because "the kernel is using an initramfs for the initial root filesystem" <fossy>as far as i can tell CONFIG_DEVTMPFS=y just means that devtmpfs *exists*, not that it is automoutned <Googulator>but it's always automounted in an initramfs iirc, if /dev exists in it <Googulator>this is so you don't need to include systemd in your (regular, non-bootstrap) initramfs <fossy>oh ok, devtmpfs automount in initramfs is post- linux 4.9.10 <stikonas>(though perhaps kernel upgrade is out of scope now...) <stikonas>so hex1 no longer gets stuck, but now it just doesn't create any output on baremetal... (still works fine in qemu) <Googulator>fossy: is it safe to use something like "\"" in script-generator.c? <Googulator>I'm trying to do something like ( SWAP_SIZE != DISK_SIZE ) in manifest predicates - I've already implemented != support, but need a way to distinguish between variable-to-constant vs variable-to-variable comparisons <fossy>i think ive escaped quotes in m2-planet before <Googulator>( VARIABLE1 == VARIABLE2 ) vs. ( VARIABLE == " VALUE " ) seems to be the obvious choice <fossy>why not just start with quote means string, no quote means variable? <fossy>instead of spaces about the quote <Googulator>same reasaon why you can't have (VARIABLE1 && VARIABLE2) <Googulator>you would need to use something like strncpy(target, tok->val + 1, strlen(tok->val + 1) - 1) <Googulator>also, right now, VARIABLE == " VALUE WITH SPACE " won't work <fossy>hold up, what's the actual usecase for variable-constant comparisons? <Googulator>variable-variable is just nice to have, what I'm after is backing up a variable under another name <fossy>hmm, that won't really work at the moment; defines occur globally. that will just make JOBS whatever it was originally <Googulator>(simplified, because all of those also need to be transferred into bootstrap.cfg) <fossy>currently a variable cannot hold two different values at different parts of the bootstrap is what i mean <Googulator>I understand it's not scoped, but why wouldn't overdefining an existing value work? <Googulator>I actually got something like this to work in the last iteration, just with slightly different syntax <Googulator>I used define: VARIABLE = VALUE vs define: VARIABLE1 = $ VARIABLE2 there <Googulator>but I now think it's cleaner to mark literals explicitly and have everything else be a name, vs. have plain text be names in one context, values in another, and having to mark when you do want a name even though it would by default be a value <Googulator>there's actually code in script-generator.c for updating an existing variable <fossy>yes, you can update an existing variable, but only the new value is ever used <fossy>the only way that variables are passed through to live-bootstrap is through output_config function <Googulator>the only trick to keep in mind is that bootstrap.cfg will initially contain the values valid at the end <fossy>so how does backing up the variable help? <Googulator>right, that's what I meant by having to also use an improve step to transfer the desired value into bootstrap.cfg <Googulator>It helps if you're using that variable for predicates <fossy>oh okay... but in this context of JOBS, what did you do to make JOBS = 1 ever actually apply? <Googulator>I reset JOBS to 1 as the very last step in the manifest <Googulator>then I use an improve step immediately after setting JOBS = ORIGINAL_JOBS that appends the correct value of JOBS to bootstrap.cfg, so it gets used from then on <fossy>okay that makes a lot more sense <Googulator>All of this is needed to reenable bootstrapping on multiple cores <Googulator>fossy: the script-generator uninitialized variable bug I reported on the 5th came back to haunt me now... <Googulator>that's gonna get used uninitialized if the very first directive in the manifest is a define <Googulator>...and after some fiddling with the memory modules, it booted <Googulator>seems to have an issue cold-booting with 4 different sticks of RAM installed <Googulator>boots fine with 2, then adding 2 more (while standby power is on, but the board is not running) works <Googulator>but all 4 installed + multiple days without power = black screen <Googulator>reminds me of my old Acer which wouldn't boot after a CMOS clear with more 4GB or more installed <Googulator>take #2: forgot to patch memory map & ramdisk size in kexec-fiwix... <Googulator>(the commit I just pushed to simplify-playground already has these fixed, for anyone trying at home) <Googulator>shutil.copytree expects its target directory to _not_ exist yet when called <fossy>oh totally forgot about that script-generator bug Googulator <fossy>hm, not sure when that external_sources bug was added, because i did test external-sources near the end <Googulator>looks like it's not the only bug in bwrap either: mknod: `/dev/sda': Operation not permitted <Googulator>Booted into Linux... and it failed on creating swap <Googulator>thanks to the new Bash trap feature, it's salvageable <Googulator>(I just need to create the swap by hand, and then drop back to the script) <fossy>i'll retest bwrap, not sure i tested it sufficiently toward the end <Googulator>manually creating the swap worked, now building curl <Googulator>meanwhile, pushed the missing script to simplify-playground <Googulator>Just had a chance to check up on the bare metal test system again <Googulator>good, because it's the last step and it hasn't failed <Googulator>bad, because it has been running for 13 hours now <Googulator>This same system completed the bootstrap in 7 hours before simplify <matrix_bridge><Andrius Štikonas> By the way, mes or looks OK, I had something similar but not rebased after fossy's merges <matrix_bridge><Andrius Štikonas> Hmm, is it due to parallelism not working? <Googulator>It is, but it's not the only place where it's visible <Googulator>The regression was already well apparent during guile's BOOTSTRAP(phase0) <Googulator>needs HAVE_RENAME defined since the new mes now supports rename() <Googulator>I tested in bwrap with --build-kernels, so fiwix was at the very least built <Googulator>might need to switch from linux/rename.c to stub/rename.c though, if the real kernel doesn't support the syscall <Googulator>oddly, each mes version prints different error messages during the tcc build <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>->type--: not a <type>: (typename "BufferedFile") <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <Googulator>rank--: not a pointer: #<<type> type: signed size: 1 description: #f> <matrix_bridge><Andrius Štikonas> I know mes 0.25 disabled some of these false warnings <Googulator>before simplify, this machine could bootstrap in half that time <Googulator>bwrap test of mes 0.26 got to populate_device_nodes, where it fails due to a permission issue (known on my part, not related to mes upgrade) <Googulator>otherwise it will do the substitution when env is created, not when it's read <Googulator>Bootstrap started again on bare metal, with this bug fixed, and mes upgraded <Googulator>...and again, because I forgot to set swap on the rootfs.py command line <matrix_bridge><Andrius Štikonas> Googulator: I guess at least x86 checksums should be updated too? <Googulator>but even worse, I forgot about pre-network-sources <Googulator>(because the branch I'm testing on no longer has it) <Googulator>+> /external/distfiles/mes-0.26.tar.gz: No such file or directory <Googulator>never heard of a situation where qemu with full emulation was faster than a real CPU... :) <matrix_bridge><Andrius Štikonas> Googulator: it was qemu with user mode emulation... <matrix_bridge><Andrius Štikonas> Still probably caused by ram bandwidth... <Googulator>of course, it's mes, the Scheme interpreter that thinks it's llama.cpp, so memory bandwidth is indeed everything <Googulator>in fact, it may be even worse than llama.cpp in this regard - I've never seen that saturate RAM bandwidth on just 1 thread, unlike mes <matrix_bridge><Andrius Štikonas> Googulator: we should also recheck mes 0.26 for any new pregen files... <matrix_bridge><Andrius Štikonas> All those new guile modules might have introduced non source stuff... <matrix_bridge><Andrius Štikonas> fossy is especially good at finding those... <Googulator>Probably the PATH fixes were not applied to amd64 <Googulator>I remember the same issue on x86 when I was working on the make-3.82 PATH issue in the draft simplify PR <matrix_bridge><Andrius Štikonas> It calla mkdir before it is installed to PATH <matrix_bridge><Andrius Štikonas> Hmm, PATH should have that from after.kaem... <Googulator>Ouch. Seems like the new mes uses more memory than what builder-hex0 is able to give it... <Googulator>when tcc-mes tries to build tcc-boot0, it just prints "tcc version 0.9.26 (i386 linux)" and dies <matrix_bridge><Andrius Štikonas> Googulator: later in the evening I'll try to compare with my unrebased patch <Googulator>I know it's after mes - but it appears that mes corrupts memory as it runs, causing tcc to subsequently fail <Googulator>probably because it overruns the memory block builder-hex0 gives it <Googulator>right now, I'm running amd64 mes (building tcc) in bwrap, VSZ is 1132564 <Googulator>x86 is probably less than that, but still likely exceeding builder-hex0's limit <stikonas>there is something wrong going on with sign vs zero extension when outputing some 32-bit constants <stikonas>I think o(0x81234567) gets sign extended to 64-bits <stikonas>and then probably loop never finishes... <Googulator>hmm, x86 version of mes tops out at 566808KiB memory usage in bwrap <Googulator>so then why is it that tcc prints its version banner and then locks up in builder-hex0? <Googulator>(it's not supposed to print a version banner there at all) <stikonas>I think my changes pre simplify pr ran till tcc 0.9.27 <stikonas>But they are basically same as your pr... <Googulator>meanwhile, checking out a memory dump from the failed bootstrap in qemu <Googulator>I wrote "/usr/bin/tcc-mes is identical to the version built in bwrap" ... which worked <[exa]>Googulator: messages starting with / may be interpreted as commands <Googulator>I think I may have figured out what's going on.. <Googulator>all of those new /lib files I had to include for mes-0.26 to successfully build itself came from /lib/linux <Googulator>including for syscalls builder-hex0 doesn't support <Googulator>and when builder-hex0 doesn't support a syscall... it pretends to <stikonas>Googulator: so why does it work on my branch? <stikonas>neither of us fully sorted the list there... <Googulator>Retrying with the new files taken from lib/stub instead <stikonas>Googulator: I had some issues with lib/stub <stikonas>so you have something to try, bisect the differences... <stikonas>in the meantime I need to figure out why my hex1.efi does not output anything one my machine <stikonas>and even hex C prototype seems to be misbehaving... <Googulator>The obvious differences are read, sleep and utime - testing with these removed <stikonas>also only stuff that tcc calls can matter... <fossy>if/when builder-hex0 is split into a lower and higher level kernel, it would be nice for the higher level kernel to print an error when a nonexistant syscall is given <fossy>i will take a look for new pregen files in mes 0.26 <fossy>i'm working on binutils 2.41 <Googulator>also, it would be nice if unsupported syscalls returned failure, to at least give programs a chance to use alternate paths <fossy>is that usual POSIX behaviour? <Googulator>swallowing an error and feigning success certainly isn't POSIX <Googulator>and now, 3 simultaneous bootstrap tests in progress (simplify-playground x86 on baremetal, mes-0.26 x86 in qemu, mes-0.26 amd64 in bwrap) <Googulator>stikonas: the current code in my repo should be good for riscv64 checksum update <Googulator>(pending fossy's OK w.r.t. generated files, of course) <stikonas>I did start looking though those scheme dirs <stikonas>well, there is that older file we found mes/module/mes/psyntax.pp but we are already removing it in the script <fossy>Googulator: could you point me to commits that aren't in the main tree that you have found to be needed for bare metal bootstrap? <Googulator>well, not quite - it's built on top of the script improvements