IRC channel logs

2022-05-19.log

back to list of logs

<stikonas>Hagfish: well, yes, basically you can't strictly prove anything from within, there should be some external check
<stikonas>though of course those extra checks from withing are not useless
<stikonas>as they could still make backdoor to be impractical
<oriansj>bauen1: well given that expectation, yes it would be quite simpler than network support. The issue I was thinking about was how live-bootstrap requires 651MB of tarballs
<oriansj>and as they are compressed tarballs, even if I were to add support for a compressed filesystem, there isn't much to gain.
<stikonas[m]>oriansj: in sysa?
<oriansj>./sources
<stikonas[m]>Live-bootsrap puts sysc tarballs into separate disk
<stikonas[m]>sources is gone now
<oriansj>ok
<oriansj>so how much downloaded sources are needed to build GCC+Linux?
<stikonas[m]>There is /sysa/distfiles and /sysc/distfiles
<stikonas>413 MB in /sysa/distfiles
<stikonas>and 262 in /sysc/distfiles (but I guess eventually that will grow)
<stikonas>sysc needs much bigger disk anyway, since it is not in memory
<stikonas>oriansj: but do you think 2nd stage kernel (after builder-hex0) can go all the way to Linux?
<stikonas>it might be tricky to write everything in M2 susbset
<oriansj>stikonas: I am uncertain but it sounds like a serious problem
<oriansj>enough to get TCC bootstrapped however is something doable
<stikonas>yeah, that might be more realistic
<stikonas>it's more stages which complicates things a bit, but tcc is way more powerful
<oriansj>and there are already kernels that are buildable by TCC
<oriansj>but are there any that could bootstrap Linux?
<stikonas>I don't know
<oriansj>nor do I
<stikonas>fossy looked at tilk
<oriansj>hence I need to think more about this
<oriansj>stikonas: what if we read the tarballs off say a DVD ISO?
<stikonas>doesn't matter where from...
<stikonas>though it's simpler from the same media
<stikonas>but I was thinking of USB sticks
<stikonas>oriansj: how much can you read easily without complications?
<oriansj>well I was thinking we boot the GFK filesystem + Kernel with none of the extra source files and mount an ISO and doing the steps to TCC and do a more advanced kernel. The sources ISO would stick around for the second kernel to continue the rest of the way
<stikonas>and by iso you mean image? or real dvd?
<oriansj>well a real DVD would be what we would be using on real hardware probably
<oriansj>(assuming ATA DVD drives can be found in a trust-worthy manner)
<oriansj>as that would be much simpler than USB support
<stikonas>well, builder-hex0 relies on BIOS to do reads
<stikonas>but I guess it's ideal to do that in later stages
<stikonas>it's not ideal
<stikonas>though it's a bit unfortunate that we would severly limit where bootstrap can run
<muurkha>you need a different bootstrap for different hardware is all
<oriansj>yeah, I am thinking builder-hex0 only supporting 3 more builds ( new bootloader, gfk-create and gfk kernel) and then just dump mescc-tools, M2-Planet/M2-Mesoplanet and mescc-tools-extra and let the loading of additional sources be done entirely with the new kernel. To minimize the work needed to port the builder-hex0 kernel to new platforms
<stikonas>well, you need to port drivers to bootstrap kernel
<oriansj>yes but that is much easier to do in C than in hex0
<stikonas>oh for sure
<stikonas>well, in hex0 we don't even do that
<stikonas>like I said it's basically calls into BIOS to do that work
<stikonas>and drivers are in BIOS
<oriansj>and other platforms don't have BIOS so we can't depend upon that
<stikonas>yes, I know
<stikonas>that's why other platforms can't easily have something like builder-hex0
<stikonas>you need some other driver integrated (maybe spi)
<oriansj>well they might be harder to port builder-hex0 to but even if they were twice as complicated (6KB) they still would only be months of work and not years
<oriansj>especially now that builder-hex0 can be used as a roadmap
<stikonas>oh, out of 413 MB in distfiles , 311 MB was linux
<stikonas>-rw-r--r-- 1 andrius andrius 136M 2022-05-05 19:00 linux-4.9.10.tar.gz
<stikonas>-rw-r--r-- 1 andrius andrius 175M 2022-05-05 19:00 linux-headers-5.10.41.tar.gz
<stikonas>so we have two different versions of linux shipped
<stikonas>though possibly just 4.9.10 might be enough
<muurkha>just two would be great
<stikonas>well, what I mean we build one linux kernel but use headers from different version
<stikonas>headers are used to build util-linux
<oriansj>although a weird thought occurred to me; we don't actually need network/internet access to download files. We could do a BBS or FTPmail download
<muurkha>sure, or you could load them off a DVD
<muurkha>or have an army of volunteers type them in from a book
<oriansj>muurkha: 413MB compressed data is a huge book (a library one might say)
<achaninja>sneakernet
<muurkha>yeah, it's about 100 volumes
<achaninja>carrier pidgeons :P
<muurkha>oriansj: probably most other platforms will be less of a pain to get a bootstrap kernel running than IBM PC compatibles
<oriansj>but yeah, DVD is probably the simplest but I did want other fun options to be possibly considered to inspire some chaotic surprise
<muurkha>my favorite candidate is laser-damaged glass
<oriansj>well floppy disks would take a bit; about 287 of them
<muurkha>but that's probably because I'm unhealthily obsessed with longevity
<oriansj>To be honest once we get this whole bit sorted, I was tempted to do a book with every bit required to bootstrap to the GFK kernel included as sources and have a DVD in the back with the rest of the bootstrap tarballs ready to go
<oriansj>as the only book you need to rebootstrap the world of software
<muurkha>I think that would be awesome
<muurkha>the sequel can include logic circuit designs
<oriansj>probably how to build all of the major parts out of ICs with a follow up book about making those same ICs via lithography
<oriansj>after that I hope someone much better at chemistry writes how to get the materials required to the level of purity required for IC manufacture
<muurkha>I think that's just a matter of zone melting
<muurkha>but you don't need IC manufacture or even transistors
<muurkha>well, probably to build GCC you do
<muurkha>but if you have a reasonably stable sequential recording medium you can make do with 4 KiB or so of RAM
<oriansj>muurkha: maybe to bootstrap to M2-Planet (with a couple tweaks) but there is no way Mescc runs in 4MiB of RAM let alone 4KiB
<oriansj>heck nyacc chokes on less than 500MB of RAM the last time I looked at it.
<muurkha>yeah, in 4K you'd probably want to bootstrap a smaller system than Scheme
<muurkha>probably something without GC
<oriansj>and core memory is crazy expensive per bit
<oriansj>so lithography even at 300nm really makes it economicially reasonable
<muurkha>32768 cores is an amount that it would be feasible for one person to thread by hand
<muurkha>and it's enough for a compiler for a high-level statically-typed language or for an interactive text editor
<oriansj>even at the rate of 1 core per second that would take over 9 hours of work
<muurkha>but it is certainly true that ICs drop the cost immensely
<muurkha>yeah, I'm thinking it would probably be more like a year of work
<muurkha>to build a computer by hand that way
<muurkha>because you have to spend a lot of time debugging things, they don't all work the first time
<muurkha>but one person-year is feasible. we're not talking about, like, 100 or ten million person-years
<oriansj>fair and that is enough for everything up to M2-Planet
<oriansj>after that it gets quite memory wasteful
<muurkha>yeah. you'd probably want to tweak it a lot to make it practically usable though
<muurkha>bootstrapping math: http://us.metamath.org/index.html
<oriansj>I'd have to tweak cc_* a bit too to make it not read all of the source at once
<muurkha>interestingly, by David A. Wheeler
<muurkha>yeah, I think a lot of the annoying aspects of C result from it being designed to be compilable in a single pass
<oriansj>muurkha: most of the most annoying aspects of C seem more to me an aspect of people forgetting the simple core that exists and trying to be clever.
<oriansj>for example #define was a clever hack but if they just did CONSTANT name value then suddenly #define madness is gone.
<muurkha>oriansj: well, the great thing about #define is that you can implement constants and inline functions in a separate pass, so the implementation doesn't have to take up precious memory space in the compiler proper
<muurkha>they weren't running on 4K machines but they did have a 64kiB memory space for code + data, like MS-DOS "tiny" model
<muurkha>and if you compare the facilities of C in Unix with those of contemporary 16-bit operating systems like the Tandem OS or Data General RDOS (for the Nova), it's a night and day difference
<oriansj>that I believe
<muurkha>Unix in 01979 is recognizably modern: SCCS, Makefiles, vi with ctags turning your source base into hypertext, metaprogramming with the C preprocessor (including, for example, generic data structures), high-level scripting with shell pipelines, etc.
<muurkha>while the facilities on other 16-bit machines of the time were extremely primitive by comparison. lots of assembly language, no high-level scripting language, janky bletcherous build processes, etc.
<muurkha>meanwhile csh had things like filename completion
<muurkha>the bell labs team got a lot of mileage out of preprocessors and their loosely-coupled-tiny-tools approach even on bigger machines. cpp is maybe not ideal for Fortran but Ratfor goes a long way to making Fortran 66 into a high-level programming langugae
<oriansj>well Unix did somethings correctly but it did miss other good ideas. Like versioning filesystems (ITS), input validating shells (DCL anyone?) but that is another topic that could take a bit
<muurkha>yeah! but ITS was on a 36-bit machine and DCL a 32-bit one
<muurkha>they weren't laboring under the same constraints
<muurkha>I liked programming in DCL a lot, but VMS programs weren't designed for composability the way Unix programs were
<muurkha>VMS's help files were a whole other world though
<muurkha>I think Linux reflinks allow you to sort of do versioning filesystems a la carte now?
<oriansj>RSX-11 had file versioning on the same PDP-11 that Unix started on
<muurkha>yeah, file versioning doesn't require a lot of extra software complexity
<muurkha>it just uses more disk space
<muurkha>and that only potentially
<oriansj>and DCL did evolve to be quite composable
<oriansj>and it was available on the RT-11
<muurkha>I never used RT-11 so I don't know what it was like
<muurkha>but the problem with DCL wasn't the composability of DCL, it was the composability of the things you had at hand
<oriansj>also lack of source code for those systems really limited one's ability to fix the defects
<muurkha>like, yesterday I wanted to know where people were reaching my "isinstance considered harmful" page, so I ran
<muurkha>(cat /var/log/apache2/access.log ; zcat /var/log/apache2/access.log.{2..14}*) | grep isinstance | awk '{print $11}' | sort | uniq -c | sort -n
<muurkha>now I'm not saying that this is the optimal way to be able to answer questions like that
<oriansj>but it could be done because tools like grep, awk, sort and uniq exist
<muurkha>it would be much nicer to say something like select referer, count(*) as n from hits where url like '%isinstance%' group by referer sort by n asc limit 10;
<muurkha>or better still some sort of interactive exploration thing
<oriansj>but unix can be hacked together in a manner that appears rather simple and produce useful results.
<muurkha>yeah
<muurkha>despite the data I was working with being a bit lame
<oriansj>and the biggest win was source code could be obtained and people could fix their own problems (to a degree)
<muurkha>I think that was a win, yeah
<muurkha>but another big win was that it was portable
<muurkha>so when people on a VAX solved a problem, it was also solved for users of the PDP-11 (usually) and the Interdata 8/32
<muurkha>a third thing though was just that you had a high-level scripting language capable of doing things like the above
<oriansj>unix is no more portable than VMS, the biggest difference was someone without a PDP but with Unix Sources could pour in enough work to get it running on what they did have. Vs VMS where if it wasn't a DEC product, you had no chance and if it was, you didn't more thanVAX as an option.
<oriansj>^more^have more^
<oriansj>only after years of porting work, did unix get the title of being "portable" because it was made that way slowly after it escaped bell labs
<oriansj>heck look at all of the Effort put in getting Xenix to run on x86 systems
<muurkha>VMS wasn't written in C, it was written in MACRO and BLISS. MACRO is explicitly not portable and BLISS is only portable in theory
<oriansj>or the years of pain in every Unix when adding a new architecture or the years of effort adding a new architecture to Linux
<muurkha>so Unix is *enormously* more portable than VMS was
<muurkha>eventually of course they did rewrite most of VMS in C so they could move to the Alpha αxp
<muurkha>but of course that wasn't until many years after Unix
<muurkha>quite aside from questions of DEC's commercial strategy
<muurkha>Unix was ported to the Interdata about the time it was released publicly
<muurkha> https://en.wikipedia.org/wiki/Version_6_Unix#Portability
<muurkha>and most of the userland had previously been ported as part of PWB
<muurkha>hmm, I think I'm misremembering that part
<oriansj>actually it looks like VMS is 1/3 BLISS and 1/3 MACRO still to today. With C only being added for the Itanium port
<muurkha>really? is the source released?
<oriansj> https://groups.google.com/g/comp.os.vms/c/3SWQiRQA1Y4/m/npx29Nm-AwAJ
<oriansj>not that I can tell
<muurkha>mala: danny your connection is bouncing
<muurkha>Kernighan & Plauger published the Ratfor version of Software Tools in 01976, at which point I think they had been using the approach on a few different big machines for a couple of years after initial explorations on Unix
<oriansj>they effectively made MACRO into a high level langauge to be compiled like C
<muurkha>speaking of cpp, one of the chapters in there is m4
<muurkha>that's pretty insane!
<muurkha>"Oh, as to the original question, looking at recent x86-64 builds, the linker object files built by the different compilers are ~55% C, ~30% BLISS, ~15% MACRO-32, <1% Assembler."
<oriansj>well C programmers were much easier to find than BLISS programmers and honestly C is a better language than BLISS in several ways that matter to kernel writers
<muurkha>my main point is that C is a lot more portable than BLISS
<muurkha>whether it's better or worse in other ways
<oriansj>so a transistion from no C code to > 33% to 55% seems about right
<muurkha>it's true that C programs often embed assumptions about the machine word size, but BLISS programs tend to do so much more pervasively (in my extremely limited experience)
<muurkha>C was designed to be portable, while BLISS was designed to be a family of languages, one per architecture
<oriansj>So perhaps it is best to say kernels written in more portable languages are less work to port and the design of the kernel ultimately plays a smaller role.
<muurkha>maybe!
<muurkha>I'm thinking that probably Unix gained a significant advantage by starting out on small, weak machines that barely had memory protection
<muurkha>porting 16-bit code to a 32-bit machine is a lot easier than vice versa, and porting a kernel whose memory protection consisted of segment registers to a machine with paging hardware also seems like it would be a lot easier than vice versa
<muurkha>I mean on Multics the fundamental filesystem access operation was mmap, and there was just no way to implement that on the PDP-7 or PDP-11
<muurkha>in the other direction though fork();execv() was gratuitously inefficient on the PDP-11, which is why we got vfork()
<muurkha>because fork() implies making a copy of the whole data segment, and then execv() throws it all away except for a few small bits
<muurkha>execve() i guess
<muurkha>and if you have even less memory protection, to the point of lacking a segment base register, your best approach to fork() is to write out a process image to disk and read it back in on process switch. ridiculously slow. spawn is much more desirable if you have enough RAM for more than one process at a time
*littlebobeep is not enthusiastic about needing an optical drive + disc to bootstrap
<muurkha>hey, it's less work than manually threading magnetic cores onto tiny wires
<oriansj>littlebobeep: completely understandable. I however am very open to suggestions on how to do a more honest bootstrap and get 100s of MB of compressed tarballs onto disk
<oriansj>as I can imagine someone taking a couple months to put stage0-posix on a hard disk or a floppy disk one sector at a time but I can't imagine someone doing the same for 10MB of tarballs let alone 100s of MB
<oriansj>muurkha: part of me wonders if C started out being just a little more forward thinking in regards to types that a whole boatload of pain and suffering could have been avoided.
<bauen1>i would argue you don't need to hand write the source tars, as long as you can audit / verify their integrity before use
<davidak[m]>The text at https://bootstrappable.org/best-practices.html#distro seem outdated considering the last achievements of GUIX.
<davidak[m]>"It is unavoidable that distributions use some binaries as part of their bootstrap chain."
<davidak[m]>Can someone update it? I would create a github issue, but there seem to be no issue tracker...
<davidak[m]>Also, it would be great to see some names and faces at https://bootstrappable.org/who.html
<Hagfish>davidak[m]: i'm not sure if GUIX doesn't require *any* binaries. there are the initial binary seeds, and there's the small issue of running them on a kernel which can build up to tcc and beyond
<Hagfish>you're right, though, that it's under-selling how close we are, and should probably be re-written sooner or later
<j-k[m]> https://guix.gnu.org/en/blog/2020/guix-further-reduces-bootstrap-seed-to-25/
<j-k[m]>has a graph of the bootstrapping, this is the latest bootstrapping post I've found
<civodul>i believe several of us here (me included) can push changes to the web site, so don't hesitate to propose patches
<civodul>j-k[m]: the latest achievements by janneke & co. don't have a blog post yet
<civodul>but you can see that at https://issues.guix.gnu.org/55227
<j-k[m]>ah very very nice
<Hagfish>"There will definately be a blog-post; I have already started to work on it. I think it's probably best to time it after core-utils has been merged into master, when "guix pull; guix system init .." actually installs a system built from 357 bytes."
<Hagfish>that blog post is going to blow people's minds!
<Hagfish>i wasn't around when Turing published "On Computable Numbers", and i wasn't on Usenet when Stallman announced the start of the GNU project, but there is now a new generation who will get to witness the revealing of this bootstrapping achievement
<theruran>Hagfish: true! but I think it will be important to make clear that the work doesn't stop there
<Hagfish>theruran: yeah, absolutely, that's why i'm comparing it to the start of the GNU project. now that people can see this golden thread working from end to end, it's the perfect time for them to find an area they'd like to enhance
<Hagfish>i guess there are two extremes that need to be avoided: making people think that all the work has already been done, and making them think that getting to this point was trivial and didn't require the sweat of heroes
<oriansj>davidak[m]: well having atleast 1 binary file is unavoidable for most (if not all architectures); so the advice to "be clear where the binary came from and how it was produced" and "Users can reproduce the binary to verify that it has not been tampered with" still apply but I will admit suggesting to make as few binaries as possible and as small as possible would be a good idea.
<unmatched-paren>oriansj: and as you have previously stated, it's technically possible to verify that the binary is correct with no software
<oriansj>unmatched-paren: it just requires special storage media and manual inspection.
<oriansj>and assumes that the system that writes it out doesn't strip out the subversion to prevent that sort of detection.
<oriansj>and that the system reading it back in doesn't reinsert that subversion.
<unmatched-paren>oh, hm. didn't think about that last point
<davidak[m]>oriansj: so hex0 has to be in binary form to start bootstraping? and compiling it has to be done manually before?
<unmatched-paren>davidak[m]: yes
<oriansj>davidak[m]: processors generally only execute ISA instructions (which have binary encodings) and I am unware of an architecture which using only valid ascii chars would support everything required to open files, read their contents, write out the results, close the file handles and exit cleanly
<oriansj>also how to handle the comments that also exist which the process would see as bytes following a valid instruction.
<oriansj>and why we have our binaries here: https://github.com/oriansj/bootstrap-seeds along with the hex0 needed to make them
<oriansj>and why we are *VERY* explicit in stating "NEVER TRUST ANYTHING IN HERE"
<oriansj>and one can manually toggle in their own hex0 binary without much effort
<oriansj>or use countless methods to build the hex0 binary from its hex0 sources
<oriansj>and all methods should produce the exact same binary output.
<davidak[m]>thanks
<muurkha>oriansj: we can see what would have happened if C started out being just a little more forward-thinking with regard to types: it's Pascal
<Hagfish>oriansj: on the subject of "only valid ASCII", i can't help remembering this incredible feat by (the inimitable) Tom7 https://www.youtube.com/watch?v=LA_DrBwkiJA
<unmatched-paren>after researching pascal for `aesop`, i think it could be a neat language, but it was too small and impractical, so all the compiler writers just extended it and now it's basically a family of dialects
<unmatched-paren>(it also annoys me how `writeln` et al are built in and have loads of special cases)
*unmatched-paren wonders if modula-3 or oberon fix these problems
<muurkha>well, they did avoid the family-of-dialects problem
<muurkha>I forget if they avoided the you-can't-write-printf problem
<unmatched-paren>gm3c apparently isn't in guix :(
<muurkha>Brian Kernighan wrote a famous paper on this actually
<unmatched-paren>neither is anything oberon-related
<unmatched-paren> https://en.wikipedia.org/wiki/Pascal_(programming_language)#Early_criticism ?
<unmatched-paren>"Why Pascal is Not My Favorite Programming Language"?
<muurkha> https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pascal.html yeah
<unmatched-paren>muurkha: i think you _can_ write something like printf with fpc's `array of const`
<muurkha>oberon is super interesting
<unmatched-paren>also imo pascal's syntax is a bit too verbose
<unmatched-paren>`procedure` keyword, anyone?
<unmatched-paren>or the array syntax
<unmatched-paren>oh, that reminds me:
<muurkha>while that is true I regard it as being of distinctly secondary or tertiary importance
<unmatched-paren>procedures and functions are different things for some reason
<unmatched-paren>yeah, it's not a show-stopping issue, but it's annoying
<davidak[m]>would it make sense to use GUIX to build the bootstrap seed for other distributions as a cheap option for a verifiable seed until they implement full source bootstrap themselves? e.g. nixpkgs uses this to build the seed (contains coreutils, bash, busybox, gnutar, binutils, GCC ... with disabled features) https://github.com/NixOS/nixpkgs/blob/50a11f4f4301b9b4cb1f3041fca4f2e71a73d4a5/pkgs/stdenv/linux/make-bootstrap-tools.nix
<unmatched-paren>muurkha: seems like most of the criticisms in bwk's article do not apply to modern Pascals
<unmatched-paren>Borland provides `array of foo`
<unmatched-paren>instead of `array [x..y] of foo`
<unmatched-paren>funny how he gives "there is no macro processor" as a problem :P
<unmatched-paren>though i think one does exist in Borland Pascal
<unmatched-paren>{$...} comments are eaten by the preprocessor
<unmatched-paren>"Because the language is so impotent, it must be extended. But each group extends Pascal in its own direction, to make it look like whatever language they really want. Extensions for separate compilation, Fortran-like COMMON, string data types, internal static variables, initialization, octal numbers, bit operators, etc., all add to the utility of the language for one group, but destroy its portability to others.
<muurkha>unmatched-paren: they do not
<stikonas>davidak[m]: depending on your requirements, it might be easier to you live-bootstrap for your seed for other distros
<stikonas>it should be more reproducible than guix outputs
<stikonas>though if you need non-x86 binaries, then guix is your best bet for now
<littlebobeep>stikonas: what about cross-compiling ?
<stikonas[m]>Yes, you can cross compile everything from x86
<stikonas[m]>But we don't have any of that scripted
<stikonas[m]>But yes, that would work
*littlebobeep doesn't know a better path for aarch64
<stikonas[m]>Well, nothing better until mes->tcc step is done on aarch63
<littlebobeep>Mes can be compiled on aarch64 right? Just mescc needs to support it?
<stikonas[m]>mescc might already be working on aarch64
<stikonas[m]>It's tcc that doesn't
<stikonas[m]>OK, only arm32 is supported on mes
<unmatched-paren>i think tcc has an aarch port somewhere? probably mistaken as usual though
<stikonas[m]>tcc 0.9.27 has
<stikonas[m]>mescc can't build it
<stikonas[m]>Only ancient 0.9.26
<unmatched-paren>i see
<stikonas[m]>And even that one is patched
<littlebobeep>haha one subpoint release back makes it ancient?
<unmatched-paren>you could build tcc 0.9.27 with 0.9.26 couldn't you?
<unmatched-paren>then use that to cross-compile to aarch
<stikonas[m]>Then just cross-compile once you have GCC...
<stikonas[m]>littlebobeep: tcc releases are like every 5 years
<unmatched-paren>oh, you can't cross-compile at that point?
<unmatched-paren>i see
<stikonas[m]>26 is from 2013
*unmatched-paren also just realizes that the question is about building it all on aarch and feels stupid :P
<stikonas[m]>You can cross-compile then but you can just as well wait...
<doras>Does anyone mind reviewing https://github.com/fosslinux/live-bootstrap/pull/161?
<doras>I think it's pretty noncontroversial.
<Hagfish>+7 −7
<Hagfish>perfectly balanced, as all things should be
<doras>;)
<stikonas>doras: yes, looks good
<stikonas>doras: merged
<stikonas>I'll probably leave the other PR for fossy to merge
<doras>Thanks, stikonas @stikonas:libera.chat.
<oriansj>muurkha: interesting perspective. especially when it has less types than C (no unsigned nor long) but one would hope the fixed string flaw could have been skipped.
<oriansj>Hagfish: yeah that is a clever hack but notice its limits.