IRC channel logs

<oriansj>riv: we also use checksums to verify downloaded tarballs as well.

<oriansj>but yeah sha256sum is only used in stage0-posix to verify the outputs were correct.

<oriansj>also stikonas merged

<oriansj>and updated in stage0-posix

<stikonas>well, checksums of downloaded tarballs only use external GNU sha256sum

<stikonas>it's not (yet) checked inside bootstrap using bootstrapped tools

<stikonas>I think I might implement more complicated assignment operators next

<stikonas>e.g. +=

<stikonas>that should be fairly simple

<stikonas>oriansj: I'm a bit confused why in token_list https://github.com/oriansj/M2-Planet/blob/6ebe45f369d6bf678a9dd8c6313d94a16d5ca94d/cc_reader.c#L287 both "current->next" and "current->prev" are set to "token". Maybe you remember?

<stikonas>it looks like we should be constructing doubly linked list

<stikonas>hmm, maybe I should look at it with gdb...

<oriansj>stikonas: I do remember the exact reason.

<oriansj>Notice that it builds the list by reading a,b,c and building the list like: c->b->a and then reverses the list to become a->b->c

<oriansj>so ->prev is correct; but ->next needs to be reversed but a single O(n) reverse is faster than building the list correctly the first time.

<stikonas[m]>Oh, I see

<oriansj>as appending the list to the new element is quite fast and making that new element the head of the list

<stikonas[m]>OK, that's very helpful

<oriansj>perhaps I needed to add a comment so when someone else in the future hits it, it can be known by reading

<stikonas[m]>Good idea

<muurkha>happy Armistice Day!

***DiffieHellman_ is now known as Tejs

***Tejs is now known as DiffieHellman

***DiffieHellman is now known as tman

***tman is now known as DiffieHellman

***jackhill is now known as KM4MBG

***KM4MBG is now known as jackhill

<oriansj>took 18hours of fuzzing but a new segfault has been found.

<oriansj>a patch should be up shortly

<stikonas>and then we'll have to fuzzy new assignment operators once I'm done with them

<oriansj>well yes

<gbrlwck>muurkha: and i thought it was single's day

<gbrlwck>oriansj: what exactly do you mean by "fuzzing"?

<gbrlwck>Hagfish: thanks for the link! now i finally know how M$ (re-)defines security vulnerability :)

<muurkha>gbrlwck: well, I guess a lot of women found themselves single as a result of the Great War

<gbrlwck>lol

<gbrlwck>the Great War aka the prequel

<stikonas>gbrlwck: I guess oriansj just pushes random input to M2-Planet in "fuzzing"

<gbrlwck>stikonas: as in "random bytes"? and the expected results are any but a segfault?

<stikonas>yes, we don't want segfaults

<stikonas>everything should handle pointers nicely and error out on nullptr

<stikonas>rather than trying to access them

<gbrlwck>so i guess it's not random bytes but rather random C code?

<stikonas>you have to ask oriansj for details...

<stikonas>in any case "what is random C code..."

<stikonas>it might be random bytes but restricted standard characters (alphanumerics, and some symbols)

***attila_lendvai_ is now known as attila_lendvai

<gbrlwck>(where) do we have M1 written in M0 (for riscv64)?

<gbrlwck>nvm, it's in stage0-posix/riscv64/Development

<gbrlwck>no, it's not...

<stikonas>gbrlwck: there is no M1 written in M0

<stikonas>M1 is written in C and is compiled using M2-Planet but the resulting M1.M1 file is not easily human readable

<stikonas>since it's a result of compilation process

<stikonas>it does a similar thing as M0 but has a few extra goodies

<oriansj>gbrlwck: what I mean by fuzzing is I am using a standard fuzzing tool (AFL to be precise) to try to find inputs which will result in a crash rather than a meaningful exit message or a successful compile.

<oriansj>also M1 is C code built by M2-Planet into M1 output; which if you remember can be compiled by M0 as the major difference between M0 and M1 is M0 is native architecture only and M1 supports ALL of the architectures.

<oriansj>and M0 is written in host specific hex2 where as M1 is written in the M2-Planet C subset to enable enhancements required to quickly add support for new targe architectures.

<oriansj>it is also why changes in M1 for an architecture generally don't happen after a stage0-posix port completes; as they would also need to be reflected in M0 and could consume a considerable amount of effort as hex2 is not an easy language to program in.

<oriansj>(same of course could be said for hex2)

<stikonas>so I've got assignment operators working, just need to write some test

<stikonas>oriansj: https://github.com/oriansj/M2-Planet/pull/31

<stikonas>they are basically implemented in preprocessing step

<Hagfish>that's a great contribution. really high quality code. i can almost understand it :P

<Hagfish>(C isn't really my thing, but it does look very readable)

<Hagfish>i do wonder if this collection of characters is a bit inscrutable, though

<Hagfish><=>|&!^%

<Hagfish>i can't think of anything clearer

<Hagfish>assigning it to a variable/constant wouldn't add much (just a layer of indirection)

<Hagfish>there are different collections of these operator symbols in different places, and it's not obvious which belong where

<stikonas>well, that <=>|&!^% is mostly how tokenization is deal with, i.e. all those characters are put together when tokenizing

<Hagfish>and the "+" and "-" cases are dealt with separately, right?

<stikonas>somewhat forceful approach but it does tokenize it

<stikonas>yes, those are deal with separately in later if

<stikonas>because I needed more control there

<Hagfish>yup, that makes sense

<stikonas>initially - was also in that group

<Hagfish>i think i just wish that the language itself was more "regular", but that would make it less nice to actually code with

<stikonas>if you look carefully into those if's, I've also sneaked in a change to tokenizer to recognize -- and ++

<Hagfish>ooh, sneaky :)

<Hagfish>yeah, that's good

<stikonas>well, the good thing about C is that it's close to assembly

<stikonas>well -- and ++ are not working yet

<stikonas>more stuff needs to be done in the parser

<Hagfish>not working, but they are recognised as tokens?

<stikonas>yes

<Hagfish>cool

<stikonas>well, M2-Planet will complain right now if you use them

<Hagfish>better than a segfault, right?

<stikonas>I don't think it was a segfault before

<stikonas>just different error

<Hagfish>hmm, okay

<stikonas>but compilation is two stage process

<stikonas>we first tokenize file and group characters into groups of logical tokens (such as keywords, operators, etc...)

<stikonas>and then parser analyzes tokens and spits out assembly

<Hagfish>that sounds familiar

<stikonas>actually assembler (M0 and M1) also does it in two stages, first everything is tokenized

<stikonas>and then macro words are replaced with hexadecimal equivalents (and some other stuff like immediates are encoded)

<Hagfish>it feels like in assembly, everything is sort of a special case

<Hagfish>well, someone had to design the assembly language, and they had a reason for it

<Hagfish>but it's a bit more "arbitrary" than, say, a minimal Turing machine

<Hagfish>the arbitrariness is a mixture of developer helpfulness, and limitations of the hardware, i guess

<Hagfish>and, as you say, the C is close to the assembly

<stikonas>and another interesting observation I noticed

<stikonas>so cc_* compiler is mayeb 5 to 7 times larger than M0 (depending on the architecture) but almost all functions from M0 are used in cc_*

<Hagfish>hmm

<Hagfish>maybe someone has already managed to sneak in a backdoor :D

<stikonas>I didn't see any :D

<stikonas>when I was working on risc-v stuff

<Hagfish>that's a relief :)

<stikonas>well, that's one good thing about multi-arch support in stage0-posix

<stikonas>different arches are often done by different people

<stikonas>even though it mostly follows the same algorithm

<Hagfish>the fact that bootstrapping works across multiple arches is a weird kind of "proof of work" :P

<stikonas>well, so far only stage0-posix works across multiple arches

<stikonas>anything further doesn't

<Hagfish>oh, that's a pity

<oriansj>Hagfish: well if there is a backdoor, it is entirely in human auditable source or the tiny root binaries you can trivially audit or replace.

<oriansj>But string matching is kinda universally useful

<oriansj>hence why it appears in M0, cc_*, etc

<oriansj>but yeah the reader would be consideraly more simple if we could assume whitespace between C tokens. (aka require a=b+c; to be written as a = b + c ;)

<stikonas>well, a lot of C requirements are actually due to how tokenization work

<stikonas>e.g. stuff like a++1doesn't work and needs space in between: a+ +1

<oriansj>I was thinking more in terms of how much simpler an S-expression tokenizer is

IRC channel logs

2021-11-11.log