<oriansj>riv: we also use checksums to verify downloaded tarballs as well. <oriansj>but yeah sha256sum is only used in stage0-posix to verify the outputs were correct. <stikonas>well, checksums of downloaded tarballs only use external GNU sha256sum <stikonas>it's not (yet) checked inside bootstrap using bootstrapped tools <stikonas>I think I might implement more complicated assignment operators next <stikonas>it looks like we should be constructing doubly linked list <stikonas>hmm, maybe I should look at it with gdb... <oriansj>stikonas: I do remember the exact reason. <oriansj>Notice that it builds the list by reading a,b,c and building the list like: c->b->a and then reverses the list to become a->b->c <oriansj>so ->prev is correct; but ->next needs to be reversed but a single O(n) reverse is faster than building the list correctly the first time. <oriansj>as appending the list to the new element is quite fast and making that new element the head of the list <oriansj>perhaps I needed to add a comment so when someone else in the future hits it, it can be known by reading ***DiffieHellman_ is now known as Tejs
***Tejs is now known as DiffieHellman
***DiffieHellman is now known as tman
***tman is now known as DiffieHellman
***jackhill is now known as KM4MBG
***KM4MBG is now known as jackhill
<oriansj>took 18hours of fuzzing but a new segfault has been found. <stikonas>and then we'll have to fuzzy new assignment operators once I'm done with them <gbrlwck>muurkha: and i thought it was single's day <gbrlwck>oriansj: what exactly do you mean by "fuzzing"? <gbrlwck>Hagfish: thanks for the link! now i finally know how M$ (re-)defines security vulnerability :) <muurkha>gbrlwck: well, I guess a lot of women found themselves single as a result of the Great War <stikonas>gbrlwck: I guess oriansj just pushes random input to M2-Planet in "fuzzing" <gbrlwck>stikonas: as in "random bytes"? and the expected results are any but a segfault? <stikonas>everything should handle pointers nicely and error out on nullptr <gbrlwck>so i guess it's not random bytes but rather random C code? <stikonas>it might be random bytes but restricted standard characters (alphanumerics, and some symbols) ***attila_lendvai_ is now known as attila_lendvai
<gbrlwck>(where) do we have M1 written in M0 (for riscv64)? <gbrlwck>nvm, it's in stage0-posix/riscv64/Development <stikonas>M1 is written in C and is compiled using M2-Planet but the resulting M1.M1 file is not easily human readable <stikonas>since it's a result of compilation process <stikonas>it does a similar thing as M0 but has a few extra goodies <oriansj>gbrlwck: what I mean by fuzzing is I am using a standard fuzzing tool (AFL to be precise) to try to find inputs which will result in a crash rather than a meaningful exit message or a successful compile. <oriansj>also M1 is C code built by M2-Planet into M1 output; which if you remember can be compiled by M0 as the major difference between M0 and M1 is M0 is native architecture only and M1 supports ALL of the architectures. <oriansj>and M0 is written in host specific hex2 where as M1 is written in the M2-Planet C subset to enable enhancements required to quickly add support for new targe architectures. <oriansj>it is also why changes in M1 for an architecture generally don't happen after a stage0-posix port completes; as they would also need to be reflected in M0 and could consume a considerable amount of effort as hex2 is not an easy language to program in. <oriansj>(same of course could be said for hex2) <stikonas>so I've got assignment operators working, just need to write some test <stikonas>they are basically implemented in preprocessing step <Hagfish>that's a great contribution. really high quality code. i can almost understand it :P <Hagfish>(C isn't really my thing, but it does look very readable) <Hagfish>i do wonder if this collection of characters is a bit inscrutable, though <Hagfish>assigning it to a variable/constant wouldn't add much (just a layer of indirection) <Hagfish>there are different collections of these operator symbols in different places, and it's not obvious which belong where <stikonas>well, that <=>|&!^% is mostly how tokenization is deal with, i.e. all those characters are put together when tokenizing <Hagfish>and the "+" and "-" cases are dealt with separately, right? <stikonas>somewhat forceful approach but it does tokenize it <stikonas>yes, those are deal with separately in later if <Hagfish>i think i just wish that the language itself was more "regular", but that would make it less nice to actually code with <stikonas>if you look carefully into those if's, I've also sneaked in a change to tokenizer to recognize -- and ++ <stikonas>well, the good thing about C is that it's close to assembly <stikonas>more stuff needs to be done in the parser <Hagfish>not working, but they are recognised as tokens? <stikonas>well, M2-Planet will complain right now if you use them <stikonas>we first tokenize file and group characters into groups of logical tokens (such as keywords, operators, etc...) <stikonas>and then parser analyzes tokens and spits out assembly <stikonas>actually assembler (M0 and M1) also does it in two stages, first everything is tokenized <stikonas>and then macro words are replaced with hexadecimal equivalents (and some other stuff like immediates are encoded) <Hagfish>it feels like in assembly, everything is sort of a special case <Hagfish>well, someone had to design the assembly language, and they had a reason for it <Hagfish>but it's a bit more "arbitrary" than, say, a minimal Turing machine <Hagfish>the arbitrariness is a mixture of developer helpfulness, and limitations of the hardware, i guess <Hagfish>and, as you say, the C is close to the assembly <stikonas>and another interesting observation I noticed <stikonas>so cc_* compiler is mayeb 5 to 7 times larger than M0 (depending on the architecture) but almost all functions from M0 are used in cc_* <Hagfish>maybe someone has already managed to sneak in a backdoor :D <stikonas>well, that's one good thing about multi-arch support in stage0-posix <stikonas>different arches are often done by different people <stikonas>even though it mostly follows the same algorithm <Hagfish>the fact that bootstrapping works across multiple arches is a weird kind of "proof of work" :P <stikonas>well, so far only stage0-posix works across multiple arches <oriansj>Hagfish: well if there is a backdoor, it is entirely in human auditable source or the tiny root binaries you can trivially audit or replace. <oriansj>But string matching is kinda universally useful <oriansj>hence why it appears in M0, cc_*, etc <oriansj>but yeah the reader would be consideraly more simple if we could assume whitespace between C tokens. (aka require a=b+c; to be written as a = b + c ;) <stikonas>well, a lot of C requirements are actually due to how tokenization work <stikonas>e.g. stuff like a++1doesn't work and needs space in between: a+ +1 <oriansj>I was thinking more in terms of how much simpler an S-expression tokenizer is