IRC channel logs

<stikonas>hmm, I think hex1_x86 is now working...

<stikonas>Although somewhat slowly

<stikonas>probably some alignment is messed up but will do for now

<Hagfish>that sounds great

<stikonas>oriansj: https://github.com/oriansj/stage0-posix/pull/29

<stikonas>maybe I should have started with kaem...

<stikonas>but now with some experience I can probably deal with kaem, that one shouldn't have problems due to risc-v encodings

<xentrac>yaay

<stikonas>xentrac: well, it's not good enough for risc-v native hex1...

<xentrac>it's progress!

<Hagfish>the syntax is very alien to me (which makes it all the more impressive), but the layout and commenting/naming makes it quite reasonable to follow

<Hagfish>i was a little uncertain whether "# Get number of the args" should be "# Get the number of args", but that could be because i don't understand what the code is doing there

<stikonas>that comment is copy/pasted...

<Hagfish>i'm sure that other people won't be as confused as me by it :)

<stikonas>Hagfish: well, syntax is basically: leftmost is machine code (those hex numbers), then the next column is assembly instructions and finally human readable comments

<Hagfish>yeah, that's probably quite conventional

<Hagfish>but it feels like, to write that sort of code, you have to be pretty fluent in machine code

<stikonas>Hagfish: not really... I literally knew nothing about RISC-V or any other machine code a week ago

<Hagfish>woah

<Hagfish>seriously, you'd never written a line of machine code before?

<stikonas>now I've read a few pages of RISC-V ISA documentation (out of 100 pages)...

<stikonas>no, and even fairly little assembly experience

<stikonas>only some trivial things

<Hagfish>that's some impressive adaptability

<stikonas>maybe. Or maybe you overestimate how difficult these things are

<Hagfish>that's possible, yes

<stikonas>they certainly sound scary

<stikonas>and writing assembly is not that different from certain subset of C...

<Hagfish>i keep thinking that this bootstrapping project really needs a dozen more experts, each with decades of experience in their specialisms, but maybe there are other ways to attack the problem

<stikonas>basically you should not use functions and use goto everywhere

<Hagfish>i suppose that's not so strange once you get used to it, but i think some people would find it easier than others

<xentrac>I don't think assembly is hard. assembly is mostly simple. machine code is mostly simple, although amd64 has decades of accumulated fungus

<stikonas>yeah, and risc-v doesn't have that. It has other annoying things though as we all discussed

<xentrac>what's hard is *debugging* a large assembly program

<stikonas>yeah, that's true...

<xentrac>also I mean writing things in assembly takes longer

<stikonas>well, having a simple C prototype might help

<xentrac>a lot of things are 100 lines of Python, 400 lines of C, or 2000 lines of assembly

<xentrac>not all things! a picture is worth a thousand words, but hardly any 1000-word essays can be adequately replaced by a picture

<stikonas>hex1.c is certainly not that much shorter than assembly version: https://github.com/oriansj/stage0-posix/blob/master/High%20Level%20Prototypes/hex1.c

<xentrac>similarly it's quite common to have 2000 lines of assembly that would be 1500 lines of C or 1000 lines of Python

<stikonas>yeah, it's close to latter here

<xentrac>one of the first algorithms in Knuth is a linear-time topological sort

<xentrac>I translated it from MIX assembly language to Python and it... was the same size

<xentrac>but there are other times where you're like

<xentrac> for y in range(min_y, max_y if max_y is not None else layout.height):

<xentrac> stream.writelines(layout.scan(y, w if w is not None else layout.width))

<xentrac>or

<xentrac> left = (('Chapters' | String('')) - hr -

<xentrac> (chapno | vr | [t | ~String(' . ') for t in titles]))

<stikonas>yeah, that would be nasty in assembly

<stikonas>fortunately, none of the bootstrap code is this complicated

<xentrac>I think in assembly you'd take a different approach

<stikonas>I don't think we ever use kaem " escaping in kaem-minimal... that's probably some opportunity to reduce its size...

<oriansj>stikonas[m]: you are probably quite right about "raw strings" in kaem-minimal but I did design it before live-bootstrap existed and didn't want to shoe-horn it in later as hex0 programming is a huge pain in the butt.

<oriansj>which is why figuring out the correct RISC-V hex2 syntax is probably the best way to figure out the correct way to implement hex1

<oriansj>and stikonas[m] your pull request to stage0-posix has been merged.

<oriansj>also one stupid trick for writing hex0 programs: write them in M1 first. (it enables using blood-elf to make the binaries debuggable and thanks to 'raw hex strings' you can convert one line at a time once it works and then just pull out the '/'' and you have your hex0 program)

<siraben>Melg8[m]: how can I use https://github.com/melg8/cit/blob/feature/BootstrapNix/bootstrap_nix/bootstrap_seeds/default.nix ?

<siraben>Is it possible to get a Nix shell with the bootstrapped GCC?

<siraben>I get `error: unsupported Git input attribute 'name'` when I run `nix-build`

<stikonas>so removing unnecessary features from tokenization in kaem-optional-seed can save at least 29 bytes

<stikonas>actually a bit more since I have not added jump addresses to that number

<stikonas>it might even be enough to push hex0 + kaem-optional-seed to sub 1KB size

<stikonas>ok, it can go down from 737 bytes to at least 686

<stikonas>so at least 51 byte saving

<stikonas>(on x86)

<oriansj>stikonas: very nice. There are also other savings possible for the bootstrap-seeds kaem-minimal (but not the stage0-posix kaem-minimal) with the disabling of verbose mode and turning off the error message of failure. (which means we can remove the fputc and file_print functions entirely) There are also optimizations with malloc. And if we don't need the \ behavior in the initial kaem.run we can take that out too.

<oriansj>Te null values at the end can also be omitted (but you might want to keep the comments of their addresses)

<oriansj>several of those tweaks will not work in stage0-posix because they are needed but in bootstrap-seeds, they probably will work just fine.

<stikonas[m]>Yeah, we can have kaem-optional-seed building what is now currently kaem-minimal...

<stikonas>oriansj: \ behavious is not needed in stage0-posix too, I already counted that in 51 byte...

<stikonas>I didn't convert M1 to hex0 yet though...

<stikonas>which is always the annoying part...

<stikonas>in this case it's mostly removing stuff, but addresses will change...

<oriansj>stikonas: addresses can't change from M1 to hex0, unless you use a different ELF_header.hex2 file

<oriansj>also Phase-5 and later do use \ for pretter lines

<stikonas>oh yes, that's true... missed that in phase 5...

<stikonas>oriansj: doesn't jump amount change?

<stikonas>if you remove lines

<stikonas>M1 still has labels rather than hex jumps

<oriansj>stikonas: when you remove lines in te hex0 the addresses change exactly as much as the same changes in M1 do

<stikonas>although, if one does two stage kaem-minimal approach, then maybe it's simpler to just hardcode 3 commands needed to build kaem-minimal...

<oriansj>as the instructions are the exact same between the two, M1 is just easier to write

<oriansj>stikonas: definitely mentioned before but I never had the time to see if that was true

<stikonas>yeah, I think I mentioned that...

<stikonas>but yes, it's kind of different program then

<stikonas>so would have to be written from scratch

<stikonas>might be easier to experiment with a new arch... e.g. riscv...

<oriansj>well in theory it would be 3 array tables pointing to a block of strings and the current call block being called 3 times with those arrays

<oriansj>more pointers to keep track of but should definitely be less instructions.

<stikonas>actually, maybe just 2 commands

<stikonas>rebuild hex0 and then directly build kaem-minimal

<stikonas>well, ok, also need to run it

<stikonas>so 3...

<oriansj>usually it is 0, 1 or infinity

***janneke_ is now known as janneke

<stikonas>oriansj: since for riscv byte boundaries are meaningless maybe the following would work (at the expense of fairly different hex implementation):

<stikonas>instead of processing two hex characters at a time (what toggle does)

<stikonas>we process 6 fields separately (7, 5, 3, 5, 5, 7 bits, which can be represented by e.g. 2, 2, 1, 2, 2, 2 hex numbers, or we can do some other representation)

<stikonas>and that should work with all opcodes (maybe with the exception of FENCE)

<stikonas>hex1, hex2 will have to do somewhat different calculation to combine it into a single byte, but I think it wouldn't be too hard

<stikonas>actually into 4 bytes

<stikonas>well, maybe should try writing some prototype in C...

<stikonas>at least from the point of view of M1, it should be fairly trivial then

<stikonas>hmm, although, that doesn't completely answer the question of immediate mangling in S and B-type opcodes

<xentrac>stikonas: that sounds like a reasonable plan

<stikonas>there is still some uncertainties but I think say it should be possible to implement those

<stikonas>possibly need those arch specific prefixes specify how to encode some immediates...

<stikonas>as not all can be inferred from the position

<xentrac>yeah

<stikonas>maybe just have 7 different prefixes specifying which of the 7 encodings we use in that position

<stikonas>but at least M1 and anything higher won't need any changes

<stikonas>hex1, hex2, and hex2.c are the only things needing modification

<stikonas>first two are the hardest to write though...

<oriansj>well hex2 supports binary input (just 0 and 1) but the problem is that little endian architectures break up immediates that span the byte boundaries

<stikonas>oh yes...

<oriansj>So it is best to go word oriented with RISC-V

<stikonas>so read the whole instruction?

<stikonas>well, that's what I was thinking too...

<stikonas>then we can have 7 different prefixes for each immediate encoding

<oriansj>and convert !@$~ to mangle that word in RISC-V specific ways (or AArch64 specific ways if we want to packport that advance)

<stikonas>yes...

<stikonas>well, it's probably simplest, .hex2 object files will grow by maybe 40% but at least it looks simple

<oriansj>well we need only support the immediates in SB-Format and UJ-Format instructions

<oriansj>(in hex2)

<stikonas>and I?

<stikonas>well, it's not mangled