IRC channel logs

2021-11-11.log

back to list of logs

<oriansj>riv: we also use checksums to verify downloaded tarballs as well.
<oriansj>but yeah sha256sum is only used in stage0-posix to verify the outputs were correct.
<oriansj>also stikonas merged
<oriansj>and updated in stage0-posix
<stikonas>well, checksums of downloaded tarballs only use external GNU sha256sum
<stikonas>it's not (yet) checked inside bootstrap using bootstrapped tools
<stikonas>I think I might implement more complicated assignment operators next
<stikonas>e.g. +=
<stikonas>that should be fairly simple
<stikonas>oriansj: I'm a bit confused why in token_list https://github.com/oriansj/M2-Planet/blob/6ebe45f369d6bf678a9dd8c6313d94a16d5ca94d/cc_reader.c#L287 both "current->next" and "current->prev" are set to "token". Maybe you remember?
<stikonas>it looks like we should be constructing doubly linked list
<stikonas>hmm, maybe I should look at it with gdb...
<oriansj>stikonas: I do remember the exact reason.
<oriansj>Notice that it builds the list by reading a,b,c and building the list like: c->b->a and then reverses the list to become a->b->c
<oriansj>so ->prev is correct; but ->next needs to be reversed but a single O(n) reverse is faster than building the list correctly the first time.
<stikonas[m]>Oh, I see
<oriansj>as appending the list to the new element is quite fast and making that new element the head of the list
<stikonas[m]>OK, that's very helpful
<oriansj>perhaps I needed to add a comment so when someone else in the future hits it, it can be known by reading
<stikonas[m]>Good idea
<muurkha>happy Armistice Day!
***DiffieHellman_ is now known as Tejs
***Tejs is now known as DiffieHellman
***DiffieHellman is now known as tman
***tman is now known as DiffieHellman
***jackhill is now known as KM4MBG
***KM4MBG is now known as jackhill
<oriansj>took 18hours of fuzzing but a new segfault has been found.
<oriansj>a patch should be up shortly
<stikonas>and then we'll have to fuzzy new assignment operators once I'm done with them
<oriansj>well yes
<gbrlwck>muurkha: and i thought it was single's day
<gbrlwck>oriansj: what exactly do you mean by "fuzzing"?
<gbrlwck>Hagfish: thanks for the link! now i finally know how M$ (re-)defines security vulnerability :)
<muurkha>gbrlwck: well, I guess a lot of women found themselves single as a result of the Great War
<gbrlwck>lol
<gbrlwck>the Great War aka the prequel
<stikonas>gbrlwck: I guess oriansj just pushes random input to M2-Planet in "fuzzing"
<gbrlwck>stikonas: as in "random bytes"? and the expected results are any but a segfault?
<stikonas>yes, we don't want segfaults
<stikonas>everything should handle pointers nicely and error out on nullptr
<stikonas>rather than trying to access them
<gbrlwck>so i guess it's not random bytes but rather random C code?
<stikonas>you have to ask oriansj for details...
<stikonas>in any case "what is random C code..."
<stikonas>it might be random bytes but restricted standard characters (alphanumerics, and some symbols)
***attila_lendvai_ is now known as attila_lendvai
<gbrlwck>(where) do we have M1 written in M0 (for riscv64)?
<gbrlwck>nvm, it's in stage0-posix/riscv64/Development
<gbrlwck>no, it's not...
<stikonas>gbrlwck: there is no M1 written in M0
<stikonas>M1 is written in C and is compiled using M2-Planet but the resulting M1.M1 file is not easily human readable
<stikonas>since it's a result of compilation process
<stikonas>it does a similar thing as M0 but has a few extra goodies
<oriansj>gbrlwck: what I mean by fuzzing is I am using a standard fuzzing tool (AFL to be precise) to try to find inputs which will result in a crash rather than a meaningful exit message or a successful compile.
<oriansj>also M1 is C code built by M2-Planet into M1 output; which if you remember can be compiled by M0 as the major difference between M0 and M1 is M0 is native architecture only and M1 supports ALL of the architectures.
<oriansj>and M0 is written in host specific hex2 where as M1 is written in the M2-Planet C subset to enable enhancements required to quickly add support for new targe architectures.
<oriansj>it is also why changes in M1 for an architecture generally don't happen after a stage0-posix port completes; as they would also need to be reflected in M0 and could consume a considerable amount of effort as hex2 is not an easy language to program in.
<oriansj>(same of course could be said for hex2)
<stikonas>so I've got assignment operators working, just need to write some test
<stikonas>oriansj: https://github.com/oriansj/M2-Planet/pull/31
<stikonas>they are basically implemented in preprocessing step
<Hagfish>that's a great contribution. really high quality code. i can almost understand it :P
<Hagfish>(C isn't really my thing, but it does look very readable)
<Hagfish>i do wonder if this collection of characters is a bit inscrutable, though
<Hagfish><=>|&!^%
<Hagfish>i can't think of anything clearer
<Hagfish>assigning it to a variable/constant wouldn't add much (just a layer of indirection)
<Hagfish>there are different collections of these operator symbols in different places, and it's not obvious which belong where
<stikonas>well, that <=>|&!^% is mostly how tokenization is deal with, i.e. all those characters are put together when tokenizing
<Hagfish>and the "+" and "-" cases are dealt with separately, right?
<stikonas>somewhat forceful approach but it does tokenize it
<stikonas>yes, those are deal with separately in later if
<stikonas>because I needed more control there
<Hagfish>yup, that makes sense
<stikonas>initially - was also in that group
<Hagfish>i think i just wish that the language itself was more "regular", but that would make it less nice to actually code with
<stikonas>if you look carefully into those if's, I've also sneaked in a change to tokenizer to recognize -- and ++
<Hagfish>ooh, sneaky :)
<Hagfish>yeah, that's good
<stikonas>well, the good thing about C is that it's close to assembly
<stikonas>well -- and ++ are not working yet
<stikonas>more stuff needs to be done in the parser
<Hagfish>not working, but they are recognised as tokens?
<stikonas>yes
<Hagfish>cool
<stikonas>well, M2-Planet will complain right now if you use them
<Hagfish>better than a segfault, right?
<stikonas>I don't think it was a segfault before
<stikonas>just different error
<Hagfish>hmm, okay
<stikonas>but compilation is two stage process
<stikonas>we first tokenize file and group characters into groups of logical tokens (such as keywords, operators, etc...)
<stikonas>and then parser analyzes tokens and spits out assembly
<Hagfish>that sounds familiar
<stikonas>actually assembler (M0 and M1) also does it in two stages, first everything is tokenized
<stikonas>and then macro words are replaced with hexadecimal equivalents (and some other stuff like immediates are encoded)
<Hagfish>it feels like in assembly, everything is sort of a special case
<Hagfish>well, someone had to design the assembly language, and they had a reason for it
<Hagfish>but it's a bit more "arbitrary" than, say, a minimal Turing machine
<Hagfish>the arbitrariness is a mixture of developer helpfulness, and limitations of the hardware, i guess
<Hagfish>and, as you say, the C is close to the assembly
<stikonas>and another interesting observation I noticed
<stikonas>so cc_* compiler is mayeb 5 to 7 times larger than M0 (depending on the architecture) but almost all functions from M0 are used in cc_*
<Hagfish>hmm
<Hagfish>maybe someone has already managed to sneak in a backdoor :D
<stikonas>I didn't see any :D
<stikonas>when I was working on risc-v stuff
<Hagfish>that's a relief :)
<stikonas>well, that's one good thing about multi-arch support in stage0-posix
<stikonas>different arches are often done by different people
<stikonas>even though it mostly follows the same algorithm
<Hagfish>the fact that bootstrapping works across multiple arches is a weird kind of "proof of work" :P
<stikonas>well, so far only stage0-posix works across multiple arches
<stikonas>anything further doesn't
<Hagfish>oh, that's a pity
<oriansj>Hagfish: well if there is a backdoor, it is entirely in human auditable source or the tiny root binaries you can trivially audit or replace.
<oriansj>But string matching is kinda universally useful
<oriansj>hence why it appears in M0, cc_*, etc
<oriansj>but yeah the reader would be consideraly more simple if we could assume whitespace between C tokens. (aka require a=b+c; to be written as a = b + c ;)
<stikonas>well, a lot of C requirements are actually due to how tokenization work
<stikonas>e.g. stuff like a++1doesn't work and needs space in between: a+ +1
<oriansj>I was thinking more in terms of how much simpler an S-expression tokenizer is