IRC channel logs

2021-06-21.log

back to list of logs

<stikonas>fossy: what do you think about splitting /after folder?
<stikonas>I'm thinking of splitting all the build scripts, so basically what is in the repo/sysa/* and installed files
<stikonas>then we'll have a cleaner image of what we built...
<Hagfish> https://discourse.nixos.org/t/nixos-unstable-s-iso-minimal-x86-64-linux-is-100-reproducible/13723
<Hagfish>the madlads have actually done it
<Hagfish>here's the HN discussion: https://news.ycombinator.com/item?id=27573393
<drakonis>highly impressive
<drakonis>it is always strange how nixos comes up more often than guix in these kinds of conversations
<Hagfish>"Guix's work on bootstrap seed reduction has been so far mostly applied to i686/amd64 and not even other architectures they support (at least, not fully)."
<Hagfish>some of the comments there are a bit spicy
<drakonis>quite
<siraben>Melg8: https://news.ycombinator.com/item?id=27573393
<siraben>bootstrapping is mentioned https://news.ycombinator.com/item?id=27574166
<Hagfish>a nice coincidence that this google have written this month about advancing reproducibility too
<Hagfish> https://security.googleblog.com/2021/06/introducing-slsa-end-to-end-framework.html
<Hagfish>"SLSA 4 is currently the highest level, requiring two-person review of all changes and a hermetic, reproducible build process."
***platinum.libera.chat sets mode: +o ChanServ
<fossy>stikonas[m]: not opposed to it
<stikonas[m]>Well, I first need to finish tcc/meslibc changes
<stikonas[m]>Then I can try to clean up directory structure
***ChanServ sets mode: +o rekado
<rekado> https://blog.dave.tf/post/finding-bottom-turtle/
<civodul>"This feels like we’ve reached a point in our bootstrapping where we have to run atop preexisting software we don’t trust, cannot replace, and cannot avoid."
<civodul>this is the usual argument that as
<civodul>this is the usual argument that assumes the outcome is binary
<civodul>whereas i think it's all about reducing the attack surface and increasing transparency
<siraben>Ooh, is the post author here?
<stikonas>still, the blog post is quite well written. But yes, increasing transparancy is also one of the outcomes, even if you don't go all the way
<stikonas>siraben: yes, the author looks to be here
<siraben>I realize the blynn-compiler stuff has been a bit stalled, unfortunately I've been too busy to get back into it, however it might be a good exercise to rewrite some of the trivial assemblers in Haskell, if nothing else to start building useful modules
<siraben>the compiler self-hosts but doesn't have much outside of what's needed to achieve that
<siraben>(by Haskell I mean the dialect accepted by blynn-compiler
<siraben>)
<siraben>stikonas: I see
<pabs3>discussion about the turtle thing: https://news.ycombinator.com/item?id=27575849
<fossy>i like this post
<fossy>i agree with civodul for the most part, but you can read that entire turtle post as how far do you have to go to eliminate the attack surface
<siraben>fossy, pabs3: this is similar to discussions in formal verification
<siraben>at some point you have to trust something about the system, such as the CPU
<siraben>of course if the assumption is wrong then all bets are off
<fossy>i also like formal verification
*fossy <3 sel4
<siraben>people here might find this interesting https://blog.compiler.ai/2021/05/06/verification-is-tcbpb-reduction/
<siraben>fossy: nice, I've heard of seL4, it's impressive
<fossy>i cannot say i understand formal verification though :P
<siraben>fossy: I've been working on formally verifying C programs recently, https://gist.github.com/siraben/bdc2aa9b4a8f197722411b334febeaa0 as a little example
<fossy>i have thought at times about the intersection between formal verification and bootstrappability
<siraben>there's a lot in common
<fossy>(is it possible to ever construct a formally verified bootstrapped bootstrapping path) (i.e. self-verifying)?
<fossy>like, that would be a massively hard probelm
<siraben>fossy: if I had infinite time I would also verify the correctness of various parts of our bootstrap :)
<siraben>fossy: yeah, I can think of a way to formulate bootstrapping formally
<fossy>yea i see that as a worthy goal also - but if we are going back to trusting trust you have to express the proofs /somehow/, and if they are being run on a computer then they are built ontop of the massive software stack anyway :P
<fossy>in reality all of these are quite unlikely as mr turtle states
<siraben>fossy: the first significant thing that has to be done is to give a formal semantics for every intermediate language of our bootstrap, this is way before we assert the compilers/interpreters are correct
<fossy>yea right
<fossy>which is not an easy task to begin with
<siraben>but assuming you have semantics, then you could now state: "let a bootstrapped compiler from asm to C be a program written in asm that compiles C to asm and preserves the program behavior"
<siraben>fossy: yeah, see the amount of work needed to prove correctness of an imperative language to a stack machine with jumps https://xavierleroy.org/courses/EUTypes-2019/slides.pdf
<siraben>fossy: things are looking up though, I mean now we have a formal semantics for C, I wonder if CompCert (the formally verified compiler) would accept C programs we write for live-bootstrap
<siraben>It's a lot of work and maybe the benefit is unclear, I don't know enough to say
<siraben><fossy> but if we are going back to trusting trust you have to express the proofs /somehow/, and if they are being run on a computer then they are built ontop of the massive software stack anyway
<siraben>yeah exactly, haha
<fossy>if i get a chance i think i will chat about this with my formal verification friends
<fossy>very interesting stuff but i am no expert in it lmao
<siraben>please let me know what they say, sounds interesting
<fossy>for seL4 at least (from the whitepaper):
<fossy>C is not a formal language; in order to allow reasoning about a C program in thetheorem prover (we use Isabelle/HOL), it has to be transformed into mathematicallogic (HOL). This is done by a C parser written in Isabelle. The parser defines thesemantics of the C program, and gives it meaning in HOL according to this seman-tics. It is this formalisation which we prove to be a refinement of
<fossy>the mathematical(abstract) model.Note that C does not have an official mathematical semantics, and parts of theC language are notoriously subtle and not necessarily that well defined. We solvethis by restricting our use of C to a well-defined subset of the language, for whichwe have an unambiguous semantics. However, this does not guarantee that ourassumed semantics for that subset is the
<fossy>same as the compiler’s. More on thatbelow.
<fossy>jeez copy pasting PDFs screws up spaces
<stikonas>fossy: I don't think self-verifying is even theoretically possible
<fossy>no i am thinking not
<stikonas>that would go against Goedel's theorem which says no mathematical system containing arithmetic can prove its own consistency
<stikonas>and bootstrapping definitely contains arithmetic
<fossy>oh yeah
<stikonas>so I think Coq cannot be formally verified using itself
<siraben>I'm just imagining a title of some paper on formally verifying a bootstrap be like, "Trusting Trusting Trust"
<siraben>fossy: hm, when was this paper written?
<siraben>"However, this does not guarantee that our assumed semantics for that subset is the same as the compiler’s"
<siraben>fossy: I should note that when I'm formally verifying C with https://vst.cs.princeton.edu , they highly recommend to use CompCert, which is verified to preserve the C semantics
<siraben>looks like seL4 trusts the C compiler instead, which is reasonable considering how well-tested these compilers are
<siraben>I think you would need to clarify what you mean by self-verification, which is distinct from self-consistency
<siraben>but verification all the way is more ideal, of course
<siraben>Well, Coq cannot prove its own logical consistency within itself
<siraben>However, you can definitely prove a program correctly implements Coq's logic in Coq
<siraben>or, say, prove a compiler for Coq written in ASM correct
<siraben>hm, I probably need to think about this more as well
<fossy>hm i havent heard of compcert i should look at that
<stikonas>although, compcert is non-commercial only
<stikonas>so not foss...
<siraben>bootstrapping feels different from completeness in logic
<siraben>fossy: yeah, it's interesting. I haven't tried using it to compile our C files, it should work
<siraben>yeah the licensing is not (totally) free unfortunately
<siraben>stikonas: yeah funnily enough, we have the trusting trust issue in math but "upwards", you can show Peano Arithmetic to be consistent, but you need a stronger theory to show it (e.g. ZFC), but then to show ZFC consistent, you need yet another stronger theory, etc.
<stikonas>and that one has no lowest turtle...
<siraben>and you cannot use any of the less powerful logics to prove (bootstrap) any of the more powerful ones consistent, unlike how we can write a C compiler in asm
<stikonas>you also can't have completeness (there will always be statements that are not provable in a given system)
<siraben>(and yes, there are already well-known statements that cannot be proven in ZFC, so you need to augment ZFC with more axioms)
<siraben>Right.
<siraben>well-known statements such as the continuum hypothesis (is there a set with cardinality between the natural numbers and reals?) is independent of ZFC
<stikonas>exactly...
<siraben>but, as I said, we're not trying to prove Coq consistent within itself, rather, that X program correctly interprets/compiles Coq's type theory, which is entirely possible
<siraben>OCaml has been bootstrapped right?
<stikonas>siraben: one version was
<stikonas> http://gallium.inria.fr/~scherer/drafts/camlboot.pdf
<stikonas>it's really slow but oh well....
<siraben>stikonas: ah, nice
<siraben>unfortunate regarding the situation for Haskell
<fossy>oh that paper expresses my thoughts on DDC well btw
<fossy>we can use DDC to prove our systems are not suspect to trusting trust
<siraben>I'd need to do more reading on DDC and its proof
<rekado>I feel that we(TM) should upgrade what’s presented on the bootstrappable.org website
<siraben>oriansj: I'm attempting a rebase on blynn's master, https://github.com/siraben/compiler/tree/rebase-2021-06 but I'm encountering some nasty makefile conflicts
<siraben>I'll let you know if it goes successfully
<oriansj>rekado: liked the article, it is a little behind on the current state of the art but other than that it appears quite good. The author should be proud.
<stikonas>danderson: ^
<oriansj>danderson: thank you for writing the article. I enjoyed it.
<oriansj>formal verification of hex0, hex1, hex2, M0 and M1 should be relatively trivial. cc_* would be harder but I do avoid all of the fuzzy parts of C (and make explicitly different choices in a few cases)
<siraben>oriansj: how should we handle the rebase?
<siraben>I'm currently fixing the first merge conflict
<siraben>I was wondering if we should cherry-pick or merge commits instead
<siraben>Ok, rebased.
<siraben>Rewriting history feels weird in this scenario
***jackhill_ is now known as jackhill
***orionwl is now known as laanwj
<xentrac>sometimes what helps or hurts formal verification can be very counterintuitive; the most practical approach is usually to develop the program at the same time as your proof so that you can avoid areas where your proof assistant is weak
<xentrac>...or so I've heard
<xentrac>03:24 <@oriansj> So no private sector company with a brain is going to hire a former State of Michigan Programmer who was terminated.
<xentrac>this is remarkably black-or-white thinking. perhaps 50% of private-sector companies would, or perhaps 10%, or perhaps 5%, or perhaps 1%, but 0% is vanishingly unlikely
<Hagfish>perhaps he's saying that in practice, you should make your decisions as if it were 0%, but yes, that does leave some room for debate
<Hagfish> https://news.ycombinator.com/item?id=27573618
<xentrac>there's enough programmer jobs out there that probably it's closer to 50%
<Hagfish>well, i don't know what the job market is like in Michigan, but you're probably right
<xentrac>I don't think his intern is making decisions in a manner that closely approximates rationality
<Hagfish>true, but they also apparently weren't able to get a job anywhere better paid
<xentrac>now she has four years of experience
<Hagfish>i'm not sure what a former employer is allowed to say about the circumstances of an employee leaving their service
<siraben>xentrac: yeah, such a parallel development is very useful
<siraben>even better if you can derive the program directly from the spec
<xentrac>I should be clear that my experience with formal methods is limited to watching introductory Coq videos though
<oriansj>xentrac: possibly correct but also remember they were assigned to the only project I got approved for public development and distribution: https://github.com/oriansj-som/Modified-Adjusted-Gross-Income-eligibility so you'll be able to see exactly what they managed to achieve in that time frame.
<oriansj> https://github.com/toniwats
<oriansj>So the only public code they have shows 2+ days to produce a 47 line delta with them just writing a single test case for a single rule
<oriansj>The python3 code is the level of complexity involved (eg simple and trivial) https://github.com/oriansj-som/Modified-Adjusted-Gross-Income-eligibility/blob/master/rules/MAGI-Children-NewBorn.py
<oriansj>All of the rules are described in detail with examples. The assigned task was write up the rules in executable form (such as the example above) and add an automatic test case.
<xentrac>yeah, to me that sounds more like she wasn't working on the project, rather than that she was working furiously on it but just found it was beyond her capacity
<oriansj>but no code or even tests written since Jan 8, 2019
<oriansj>So yeah, hence why unless massive improvements are made termination will hit her at her next quarterly review
<xentrac>well, hopefully that will be healthy for her
<oriansj>xentrac: that I can't say but it isn't by problem to fix her life.
<xentrac>right
<xentrac>also hopefully ou'
<xentrac>you'll get a better minion
<oriansj>well that we will see. As the project is estimated to save $12M/year (even if a private comany shows up and finishes it and sells support for $2M/year)
<oriansj>and there are only 104 more rules to implement.
<xentrac>cool!
<oriansj>if only some business would be wise enough to send a Freedom of Information Request to get the Rules Document. And then they would be able in a short time provide a drop in replacement and pocket $2M/year in support costs for possibly 1-2 support personnel.
<oriansj>Oh and this tool would be a great product for the other 49 states too...
<xentrac>sounds like a potentially risky business strategy, banking on State of Michigan middle managemers to give you information that reduces their headcount...
<xentrac>and then buy your product even though that reduces their budget by US$10M/year
<xentrac>I mean, I'm not saying it can't work, but you'd need to have a lot of confidence in your understanding of the internal workings of the bureaucracy to bet a lot of money on it
<siraben>oriansj: let me know what you think is the best approach to incorporate blynn's new commits to blynn-compiler
<siraben>xentrac: ok, I'm not a researcher or anything but I've gone through the first free volumes of Software Foundations and used Coq to verify some non-trivial results like Church-Rosser
<siraben>Still a noob at proof engineering (proving in the large) though
<xentrac>siraben: you're about a hundred levels above me then ;)
<siraben>haha, like mathematics there's levels all the way up :P but I try
<danderson>oriansj: thanks! I'm curious what things I missed in the state of the art. Is stage0<>mes integration further along than I thought?
<danderson>and yes, an argument I made in the post (and subsequently on HN) is that a principled bootstrap is worth doing even if it's not (currently) tractable to take it all the way down to the hardware.
<stikonas>danderson: I guess "Currently, Mes is the bottom turtle."
<stikonas>we can build Mes now
<stikonas>kernel is the big missing piece now
<danderson>right - I was going off Guix blog posts, and the impression from the Mes website that further work was still in progress
<danderson>ah! Using stage0? Or some other bootstrap chain?
<stikonas>stage0-posix
<stikonas>but yes, stage0
<stikonas>danderson: see https://github.com/fosslinux/live-bootstrap/blob/master/parts.rst
<danderson>aha, fantastic. I was going off the stage0 code on savannah, so completely missed the stage0-posix stuff
<stikonas>and it even doesn't use any pre-generated files
<stikonas>such as configure script, bison parsers, etc...
<danderson>lovely
<stikonas>so starts a bit like guix, but later diverges quite a bit to bootstrap GNU autotools and Bison
<stikonas>and also goes straight from TCC to GCC 4.0.4
<stikonas>still, it's very nicely written, especially knowing that you only based it on Guix blog posts and didn't know some newer information
<danderson>Thanks for the references! I have more to catch up on.
<stikonas>danderson: I can also find some a bit out of date dependency graph
<stikonas>for live-bootstrap
<stikonas>source https://github.com/oriansj/talk-notes/blob/master/live-bootstrap.dot and rendered https://stikonas.eu/files/bootstrap/live-bootstrap.svg
<danderson>hm, so stage0-posix is named that way because it assumes a posix kernel already running, is that right?
<stikonas>it's a bit out of date
<stikonas>danderson: yes
<stikonas>it used to be called mescc-tools-seed
<stikonas>but was renamed to stage0-posix because it closely mirrors stage0 repository
<stikonas>that is for knight vm
<danderson>Got it.
<danderson>I've been looking at SiFive's risc-v boards, and with the right jumper settings they can start executing user code almost immediately, with no imtermediate firmware. That seems like an interesting target for a bare metal stage0.
<stikonas>but I guess you already more or less answered in your post how to use it. We first build some kernel on baremetal with stage0->M2-Planet
<stikonas>although, at the moment simple POSIX kernel is a missing piece
<danderson>although I suspect that stage0 would realistically go through tccboot to boot up a linux kernel and then jump over to stage0-posix
<stikonas>bauen1 was looking a bit at it but I haven't seen him here for some time
<stikonas>danderson: it doesn't have to be linux kernel
<stikonas>can be something much simpler
<stikonas>basically only processes and a few syscalls are needed
<stikonas>oriansj remembers them better
<danderson>oh, for sure. It's a question of what to prioritize: simplicity of the bootstrap code, or reusing existing source that can let us skip steps
<stikonas>but early in the bootstrap it's mostly dealing with read/write to files in VFS and some stuff like waitpid
<stikonas>fossy is also trying to insert tccboot somewhere in the live-bootstrap
<stikonas>but it is tricky
<stikonas>tccboot is not easy to build
<danderson>yeah, tccboot looks like an interesting tech demo, but not easy to rely on
<stikonas>so first we might insert building a kernel once we have gcc
<danderson>yup, makes sense.
<stikonas>anyway, stage0-posix kind of came up from pragmatic approach
<stikonas>it's easy to run and test
<stikonas>and is not hardware specific
<danderson>Is there a list of kernel features that stage0-posix requires to function?
<danderson>yup, for sure. Getting to a hardware-abstracted bootstrap as quickly as possible is hugely desirable
<stikonas>well, it depends on the step. the later steps you look at , the more they need
<danderson>Presumably, the first kernel would need enough to get up to gcc + compiling a better kernel, right?
<stikonas>up to M2-Planet it's probably just exec file open/read/write stuff and waitpid
<stikonas>oriansj: would be good if you can confirm this
<stikonas>then mes needs a bit more stuff
<stikonas>or at least up to tcc
<stikonas>tcc is built much earlier
<stikonas>later we use bash, so probably need more stuff
<stikonas>pipes, etc...
<stikonas>by the way, M2-Planet builds non-released version of mes, it's a bit simplified
<stikonas> https://github.com/oriansj/mes-m2
<stikonas>it actually breaks GCC support
<stikonas>but that mes-m2 can then build tcc
<stikonas>(and also meslibc)
<stikonas>and live-bootstrap is also reproducible :), all binaries are checksumed
<xentrac>danderson: niice, I didn't know that. Unleashed?
<stikonas>riscv is I guess an interesting target
<stikonas>although, live-bootstrap will probably struggle at later stages
<stikonas>mostly GCC 4.0.4 and Binutils 2.14
<xentrac>I like riscv a lot. qemu is apparently still the fastest risc-v implementation available outside China though?
<danderson>xentrac: SiFive Unmatched and HiFive1 rev B
<xentrac>neat!
<stikonas>luckily, musl libc in live-bootstrap already supports riscv
<xentrac>stikonas: it's hard to build those on 64-bit, or what?
<stikonas>xentrac: they predate risc-v
<xentrac>oh, they don't have assembly code for risc-v :)
<danderson>Both of them, according to the software manual, have a boot selection DIP switch that you can set to immediately execute code from SPI flash
<stikonas>so you'll have to backport support from newer gcc
<stikonas>there will probaby be other problems too, but those are the hardest
<danderson>according to the manual, in that boot mode, the CPU executes exactly 6 instructions from ROM before jumping to user code.
<danderson>and those 6 instructions are "read the boot jumpers and jump to a mode-specific address"
<danderson>on the Unmatched, that boot mode is a little bit harsh, in that your code starts running with no access to DRAM, since you skipped the firmware that initializes the memory controller :)
<danderson>But I _think_ the memory controller interface is documented, so it should be possible to reimplement DRAM bringup
<danderson>and the CPU codes have 8k of SRAM built in. In normal use it's meant for realtime applications to have deterministic memory access time, but it gives you a lot of breathing room to get DRAM up and running.
<stikonas>well, that's what U-boot does on some arm boards
<danderson>CPU cores*
<stikonas>I have rockpro64 which also starts with RAM uninitialized. It only has access to SRAM
<danderson>stikonas: yeah, in normal operation the CPUs jump to U-Boot SPL, which initializes DRAM and switches the clock speed out of the slow reset mode
<xentrac>danderson: that sounds right. and 8K is a very generous fixed point with which to move the earth
<danderson>Depending on how strict you want to be about not depending on firmware, you could jump through u-boot SPL and end up with a much more usable machine to bootstrap on
<danderson>iiuc, the SPL part is burned into mask ROM, so you could choose to view it as part of the hardware.
<stikonas>yeah, it's always unclear where hardware stops and software starts
<danderson>yeah. I think there's a decent argument for mask ROM being hardware. It's just leveraging the general-purpose transistors to do a bit of initialization that it could do as a bespoke hardware state machine, if it wanted to.
<stikonas>especially since modern hardware is not hand manufactured but programmed...
<danderson>I think that's the hair that GPLv3 splits as well: if the firmware is non-replaceable, it's just a weirdly-shaped piece of hardware, and so is out of scope for replacing.
*xentrac starts sewing core rope ROM
<danderson>xentrac: you kid, but I've been tempted, just for the heck of it
<stikonas>well, they used core memory for apollo...
<danderson>build a computer out of discrete transistors, core rope ROM, magnetic core RAM, and so forth
<danderson>run at a blazing amount of seconds per instruction
<xentrac>danderson: no, you can go a lot faster than that
<danderson>Sure, I was being facetious. Kilohertz should be easy. Getting to MHz might be a challenge though
<xentrac>incidentally I think you can get JLCPCB to build a computer out of discrete transistors for you for about US$256
<xentrac>the details are in text/pick-and-place.md in Dernocua
<xentrac> http://canonical.org/~kragen/dernocua.git/
<xentrac>the Pineapple One apparently runs reliably at 500 kHz
<xentrac>I think the way to go for this kind of thing is automated fabrication (which is what JLCPCB does) with manual verification
<danderson>yup, though that's using logic gate ICs
<danderson>there's also TraNOR, which is fully made out of discrete transistors, and can run at 7MHz
<xentrac>yeah, using logic gate ICs my estimate is under US$32
<xentrac>nice!
<danderson>I say fully - it does use SRAM and EEPROM ICs
<danderson>rather than build fully transistorized RAM and ROM :)
<xentrac>yeah, I was gonna say, the vast majority of its transistors are in the ROM
<xentrac>square-loop ferrite logic can run up to a few MHz too
<fossy><danderson> I've been looking at SiFive's risc-v boards, and with the right jumper settings they can start executing user code almost immediately, with no imtermediate firmware. That seems like an interesting target for a bare metal stage0.
<fossy>Omg what
<fossy>that would be a perfect target if it can be run firmwareless
<stikonas>updating mes libc is a bit painful :(. All hashes change until you reach nex libc...
<stikonas>*next libc
<danderson>yeah, see MSEL dip switches in the manual. One setting makes it jump straight to SPI flash instead of going through firmware
<danderson>in that mode, only thing that runs before your code is 6 instructions to read the dip switch setting and compute the jump target
<stikonas>but you still need to load instructions somehow into CPU? What method can you use there
<danderson>SPI flash is easy to bit-bang into. Make a programming jig out of a couple switches and transistors, and load the hex0 bytes in by hand
<danderson>you'd need a modified hex0 that also brings up some of the GPIO pins to use as a parallele I/O bus for future inputs
<fossy>yeah spi
<stikonas>yeah, ok... well SPI is quite simple bus
<xentrac>amusingly I abused sigrok's SPI decoder yesterday to decode PS/2 keyboard data
<xentrac>the sigrok PS/2 decoder apparently doesn't decode the last byte in your capture, and I was using an Arduino Duemilanove as my "logic analyzer", and it only has enough RAM for 20 milliseconds, so most of my captures only had one byte in them
<xentrac>don't forget to debounce your switches
<xentrac>(hmm, could you use SPI to manually load a 512-byte boot sector into an SD card?)
<danderson>You should be able to, yeah. You need to fiddle with one of the other pins to select the right communication mode, but one of the modes is 4-bit SPI iirc
<xentrac>the standard SD card mode is a 4-bit mode, but they also support a really-serial SPI mode
<xentrac>although it might be more pleasant to use the 4-bit mode if you're setting the bits with DIP switches :)
<danderson>yeah for manual entry, 4-bit mode seems preferable
<danderson>if you have a fancier input device driving stuff, single-bit mode might be easier to hook up
<xentrac>right