IRC channel logs

<ludocode>stikonas: Yeah, this is one reason (of many) I wrote Onramp. I'm not sure there's much value in going from C to Scheme back to C; the mescc detour is really complicated and I couldn't get anything working when I tried it (granted this was a couple years ago so I'm sure it has improved a lot.) Onramp's bootstrap is overall a lot simpler

<ludocode>I've heard suggestions that bootstrapping from C to Scheme to C improves security but I think this is a misunderstanding of diverse double compilation

<ludocode>Although if by "quicker" you mean just in the time it takes, it's almost certainly not, because Onramp is really slow. But it's simpler, more portable and less code

<lanodan>Took about 15 minutes on my ryzen 3700X desktop, seems quite faster than what I remember of live-bootstrap

<lanodan>Also wrote a C implementation of the shell https://hacktivis.me/tmp/onramp_sh-oe.c compiles to 27K with `x86_64-pc-linux-musl-clang -std=c99 -Os -s -static`

<aggi>i'm heading towards pentium-m single core clocked down to 600Mhz

<aggi>although i'm happy smp seems stable now, md-raid doesn't panic anymore either... needs alot of runtime testing still

<aggi>if anytime all the fun moved to some FPGA, a single-core 600Mhz is rather luxurious (FPGA is expensive), i rather ensure bootstrapping/compilation are tailored to this

<lanodan>I think I'd more see ~600MHz x86_32 as interesting for full non-hardware-accelerated emulated virtual machines like 86Box.

<homo>several years later: bootstrap computer from stone tools :)

<ludocode>lanodan: Wow that's awesome! Thanks for implementing that! Does it work with the Onramp build scripts? This is going to be really helpful for debugging them. I'm going to try it now

<ludocode>Wow it works. Well done. I'm surprised there aren't more bugs in my build scripts, I didn't actually have an implementation of the Onramp shell to test them until now

<ludocode>I did get this error when it got to the libc/common build script: `mkdir: error: Failed creating directory: build/output/include/__onramp`

<lanodan>Ah yeah I have some mkdir calls prior in the init.c, the cmd_mkdir() implementation is incomplete

<lanodan>namely, build build/intermediate build/intermediate/hex-0-onramp build/intermediate/sh build/output/ build/output/include

<ludocode>It's funny, in an older version of the shell spec, I had `mkdir -p` not be recursive, the -p was just to prevent errors on existing directories when running with a POSIX shell

<aggi>homo: an FPGA with 600MHz and sufficient amount logic-cells for x86_32 is expensive

<homo>yeah, I've read that

<lanodan>Ah yeah that could probably help on keeping a small implementation, recursive mkdir being a bit big

<ludocode>And all the build scripts manually created all the intermediate directories, but with -p, which just looked too wrong. So I decided to change it, figured I'd have to bite the bullet and make a recursive mkdir in the shell

<ludocode>Could always change it back...

<ludocode>Maybe I'll see how hard it is to implement in bytecode

<ludocode>Found it, here's the commit: https://github.com/ludocode/onramp/commit/aa942bbacd3c3024d60e38206281a766dc3e3c39

<lanodan>Ah yeah with each build.sh having ~2 more mkdir calls it does makes things bigger

<ludocode>Yeah. I was mostly concerned with getting bug reports about the scripts being obviously wrong since -p is supposed to be recursive :)

<lanodan>grabbed recursive mkdir from my utils, doesn't seems too bad: https://hacktivis.me/tmp/onramp_sh-oe_1.c

<lanodan>Also relicensed it to MIT, like onramp is

<ludocode>Nice, the bootstrap works flawlessly with your shell

<ludocode>FYI you can setup just the VM and other platform-specific stuff with `scripts/posix/build.sh --setup`. Then running your shell on `core/build.sh` completes the bootstrap

<lanodan>Well right now I build hex in make-root.sh and init.c then builds /bin/onrampvm and build/intermediate/hex-0-onramp/hex.oe

<lanodan>And yeah after that it launches /bin/sh-oe core/build.sh

<ludocode>Nice. Glad to hear it works

<ludocode>I'm still some number of months away from getting it building TinyCC but I'm hoping to get there ASAP.

<oriansj>well the expensive part is the 600Mhz not the x86_32 core.

<Larhzu>Hello! I heard rumors that having a simple-to-build .xz file decompressor might be useful here.

<Larhzu>I made one from XZ Embedded. (A variant of that is in Linux kernel too.)

<Larhzu> https://tukaani.org/xz/xzminidec_standalone_2024-12-30.c

<Larhzu>It builds even with -std=gnu89 so I suppose there's a chance to make it build with limited compilers.

<Larhzu>If it needs tweaking, please don't edit that file. Use https://github.com/tukaani-project/xz-embedded instead, there's a script to create the above .c file.

<Larhzu>Sorry for the noise if I understood the rumors wrong. :)

<efraim>I'm going to try pulling that in for guix's bootstrap, waiting for the long decompression on some tarballs is enough to try it out

<Larhzu>xzminidec is slower than xz, especially on x86-64 now that there is assembly code. But the C code in xzminidec was slower too (but more readable).

<Larhzu>For bootstrapping I suppose the difference isn't relevant.

<Larhzu>Anyway, thanks for testing. :) I'll step away for an hour or three now.

<efraim>I think we use bootar for our initial tar/bzip2/gzip/xz https://git.ngyro.com/bootar

<Larhzu>Oh, I'll check now

<Larhzu>Ah, there is the whole decompressor written in Scheme.

<Larhzu>efraim: Did I understand correctly that you hope xzminidec (C code) to be faster than the Scheme implementation? At that point of bootstrap you have a C compiler? Sorry for silly questions.

<efraim>we get a C compiler a few steps later, at which point we could use a faster xz decompressor

<Larhzu>OK :)

<mid-kid>ludocode: the reason mescc is written in scheme is mostly because writing a scheme interpreter is simpler

<mid-kid>Or, I shouldn't say "the reason", I'm not in the dev's brain

<mid-kid>But that's the main benefit afaik

<stikonas>Larhzu: we have xz decompressor in mescc-tools-extra

<stikonas> https://github.com/oriansj/mescc-tools-extra/blob/master/unxz.c

<aggi>is there any difference between unlzma and unxz?

<aggi>iv'e just stumbled upon a collision between what busybox-1.2.2.1 got and the additional userspace such as xz-utils merged

<aggi>busybox seems a decent place to bundle (most) archival utilities into

<stikonas>it's probably a symlink

<stikonas>though lzma and xz are sligthtly different containers

<aggi>ah, ok.

<stikonas>Larhzu: that unxz is based on some other xz implementation, possibly libarchive, but adjusted to be buildable with very simplistic M2-Planet compiler

<stikonas>so that we get xz support really early in the bootstrap chain

<stikonas>(way before we have C99 compiler such as tcc)

<aggi>any chance busybox source could be digested by mesc?

<stikonas>oriansj: I think you added that unxz?

<stikonas>unlikely

<aggi>no problem

<stikonas>it would also take ages to build

<stikonas>mescc takes maybe 10 minutes on fast x86_64 machine to build tcc

<aggi>in this case it's not a task done each day, bootstrapping a C-compiler with some scheme interpreter

<aggi>meanwhile i could isolate anothre root cause for various linux24 kernel panics during boot with SMP, that is some of those were/are timer-subsystem related

<aggi>although my system-profile with tinycc is sufficiently sanitized to fit into a nosmp performance envelope, it's a nice smp is available again since last night

<aggi>currently i consider backporting various things from linux2.6/3.x back onto linux2.4 a feasible approach

<aggi>situation with kernel now is, it isn't trapped into GNU-toolchain and does permit an alternative system-integration for a fully functional development host, optional

<aggi>thing is, for example, for an ao486 fpga deployment tinycore linux exists, but this system wouldn't be self-hosting on a resource-constrained free-hardware FPGA system

<aggi>you could run such software on the type of free-hardware, but i couldn't imagine to cope with gcc/g++ on a single-core/nosmp system to maintain a complete distribution including bootstrapping from source

<Larhzu>stikonas: That unxz.c is obviously based on LZMA SDK. The SDK is the upstream of LZMA code. :)

<Larhzu>stikonas: I didn't test but I would guess that unxz.c to be a little faster than xzminidec.

<stikonas>Larhzu: hmm, haven't tested though it is definitely far slower than proper xz

<Larhzu>I haven't looked at BusyBox source in many years. I helped a bit with adding XZ Embedded and thus unxz tool into BusyBox well over ten years ago.

<stikonas>also if you build with M2-Planet, it will be even slower....

<aggi>hasn't xz been designed for multi-core parallelism?

<tmg1|michelson>Larhzu: is that available outside of github?

<Larhzu>aggi: With big files, yes. It just splits the file in pieces and processes them in parallel.

<aggi>interesting aspect with de/compression, most algorithms miss the sweet-spot between compression-ratio and runtime-performance anyway

<Larhzu>aggi: Which makes compression worse. Usually it's not significant but one can have cases where it's significant.

<Larhzu>tmg1|michelson: git clone https://git.tukaani.org/xz-embedded.git or browse at https://git.tukaani.org/?p=xz-embedded.git

<tmg1|michelson>Larhzu: thanks

<Larhzu>tmg1|michelson: Short home page: https://tukaani.org/xz/embedded.html

<aggi>with regards to runtime performance: gzip, xz and various other aren't the best choice anyway

<Larhzu>aggi: In 2005, I found LZMA hitting that sweetspot really well. It was much faster to decompress than bzip2. But nowadays with faster connections, zstd hits the spot better.

<aggi>lz4 seemed closest to the sweet spot with the compression-ratio and runtime-performance iirc, maybe lz77 too but i've not tested this one

<stikonas>well, zstd wasn't available back then

<stikonas>and also a lot of software on the bootstrap path to GCC are from early 2000

<aggi>i've blacklisted zstd on gentoo due to system-integration issues and it's dependency graph

<stikonas>so they mostly provite tar.gz and tar.bz2 with just occasional tar.xz once we get to newer software

<Larhzu>Is lz77 a tool? I know LZ77 is principle behind a lot of compressors (not in bzip2 though).

<Googulator>Larhzu: stage0-posix does now include an XZ decompressor, derived from muxzcat, but a more "official" one is still welcome

<Larhzu>aggi: lz4 is awesome for some tasks but for distributing packages it's not.

<Larhzu>Googulator: LZMA SDK is official too. muxzcat I don't know, there's a lot I haven't heard about.

<aggi>Larhzu: lz77 was the earliest compressors available, decades ago, but wasn't widely adopted as a "standard", and who knows, patent issues, don't recall the details

<Larhzu>Googulator: Whatever works well is good. :)

<Larhzu>aggi: gzip, LZMA, lz4, ... are all LZ77.

<aggi>Larhzu: that's not how i looked at it, i've merely evaluated the runtime-performance/compression-ratio balance, and if a stable imlementation exists for any

<aggi>imo, lz4 was the best

<Larhzu>stikonas: Back in 2005, even if zstd had been available, LZMA could have been better. If you had 256 kbit/s download speed, smaller file saved more time than slower decompression took.

<Larhzu>aggi: Different use cases have different sweetspots. For example, I think of "compress once, decompress many times" case when distributing software packages.

<aggi>Larhzu: which introduces another tradeoff, in practice, ideally, sticking to one and only one algorithm/implementation available on every platform, from tiniest embedded up to hpc workstations

<aggi>gzip, xz and various other are far distant to any ideal

<aggi>gzip wasn't too bad, if _all_ distfiles were distributed with this container format, but they aren't

<nimaje>I would understand lz77 as reference to the paper "A Universal Algorithm for Sequential Data Compression" (Lempel Ziv 1977), which seems to only define the base compression algorithm

<Larhzu>aggi: gzip uses Deflate which uses 32 KiB LZ77 dictionary followed by Huffman coding. If I had to guess, the 32 KiB limit might have something to do with 16-bit CPUs making it slower to handle buffers bigger than 64 KiB (the encoder data structures are bigger than just the dictionary).

<Larhzu>aggi: zstd and LZMA allow far bigger dictionaries = history buffers, so they can find more repeated sequences.

<Larhzu>nimaje: Yes

<Larhzu>LZ4 can use up to 64 KiB dictionary. One could extend the format to support more, of course. So it would be a new format again. ;)

<aggi>5 years ago i was searching for any compressor implementation to apply some LFSR scrambling with it, to remain fast and efficient

<aggi>i picked LZ4 because quality implementation existed, in linux kernel and userspace

<Larhzu>I downloaded unxz.c and also M2libc/bootstrappable.[hc] but it didn't compile due to uint32_t* being assigned to uint8_t*. Changing that made it build but it doesn't work.

<Larhzu>aggi: As I said, LZ4 is awesome for many things. I just think it's not for package distribution. gzip makes smaller files, yes, slower, but one needs really fast download speed to compensate for bigger file size at that point.

<Larhzu>aggi: So you chose well, there just are many use cases and thus many compressors exist. :)

<lanodan>ludocode: btw while I think about it, might make sense to use a non-printable character (like how ELF files use DEL) for onramp bytecode format, specially as with the wrap-header feature means it's not always at start of the file.

<Larhzu>stikonas: Re: "so they mostly provite tar.gz and tar.bz2" -- From trust point of view, I don't understand why one couldn't decompress before starting the bootstrap. I know it's about trusting only the original files but... somehow you have to get the original files.

<Larhzu>With the pointer type fix, unxz.c works for tiny files.

<stikonas>Larhzu: yeah, in principle from the trust point of view decompressed files are fine

<stikonas>just inconvenient to distribute

<stikonas>Larhzu: alternatively, you could (recursively clone stage0-posix) and run `make test-amd64`

IRC channel logs

2024-12-30.log