<stikonas>probably some alignment is messed up but will do for now <stikonas>maybe I should have started with kaem... <stikonas>but now with some experience I can probably deal with kaem, that one shouldn't have problems due to risc-v encodings <stikonas>xentrac: well, it's not good enough for risc-v native hex1... <Hagfish>the syntax is very alien to me (which makes it all the more impressive), but the layout and commenting/naming makes it quite reasonable to follow <Hagfish>i was a little uncertain whether "# Get number of the args" should be "# Get the number of args", but that could be because i don't understand what the code is doing there <Hagfish>i'm sure that other people won't be as confused as me by it :) <stikonas>Hagfish: well, syntax is basically: leftmost is machine code (those hex numbers), then the next column is assembly instructions and finally human readable comments <Hagfish>yeah, that's probably quite conventional <Hagfish>but it feels like, to write that sort of code, you have to be pretty fluent in machine code <stikonas>Hagfish: not really... I literally knew nothing about RISC-V or any other machine code a week ago <Hagfish>seriously, you'd never written a line of machine code before? <stikonas>now I've read a few pages of RISC-V ISA documentation (out of 100 pages)... <stikonas>no, and even fairly little assembly experience <stikonas>maybe. Or maybe you overestimate how difficult these things are <stikonas>and writing assembly is not that different from certain subset of C... <Hagfish>i keep thinking that this bootstrapping project really needs a dozen more experts, each with decades of experience in their specialisms, but maybe there are other ways to attack the problem <stikonas>basically you should not use functions and use goto everywhere <Hagfish>i suppose that's not so strange once you get used to it, but i think some people would find it easier than others <xentrac>I don't think assembly is hard. assembly is mostly simple. machine code is mostly simple, although amd64 has decades of accumulated fungus <stikonas>yeah, and risc-v doesn't have that. It has other annoying things though as we all discussed <xentrac>what's hard is *debugging* a large assembly program <xentrac>also I mean writing things in assembly takes longer <stikonas>well, having a simple C prototype might help <xentrac>a lot of things are 100 lines of Python, 400 lines of C, or 2000 lines of assembly <xentrac>not all things! a picture is worth a thousand words, but hardly any 1000-word essays can be adequately replaced by a picture <xentrac>similarly it's quite common to have 2000 lines of assembly that would be 1500 lines of C or 1000 lines of Python <xentrac>one of the first algorithms in Knuth is a linear-time topological sort <xentrac>I translated it from MIX assembly language to Python and it... was the same size <xentrac>but there are other times where you're like <xentrac> for y in range(min_y, max_y if max_y is not None else layout.height): <xentrac> stream.writelines(layout.scan(y, w if w is not None else layout.width)) <xentrac> left = (('Chapters' | String('')) - hr - <xentrac> (chapno | vr | [t | ~String(' . ') for t in titles])) <stikonas>fortunately, none of the bootstrap code is this complicated <xentrac>I think in assembly you'd take a different approach <stikonas>I don't think we ever use kaem " escaping in kaem-minimal... that's probably some opportunity to reduce its size... <oriansj>stikonas[m]: you are probably quite right about "raw strings" in kaem-minimal but I did design it before live-bootstrap existed and didn't want to shoe-horn it in later as hex0 programming is a huge pain in the butt. <oriansj>which is why figuring out the correct RISC-V hex2 syntax is probably the best way to figure out the correct way to implement hex1 <oriansj>and stikonas[m] your pull request to stage0-posix has been merged. <oriansj>also one stupid trick for writing hex0 programs: write them in M1 first. (it enables using blood-elf to make the binaries debuggable and thanks to 'raw hex strings' you can convert one line at a time once it works and then just pull out the '/'' and you have your hex0 program) <siraben>Is it possible to get a Nix shell with the bootstrapped GCC? <siraben>I get `error: unsupported Git input attribute 'name'` when I run `nix-build` <stikonas>so removing unnecessary features from tokenization in kaem-optional-seed can save at least 29 bytes <stikonas>actually a bit more since I have not added jump addresses to that number <stikonas>it might even be enough to push hex0 + kaem-optional-seed to sub 1KB size <stikonas>ok, it can go down from 737 bytes to at least 686 <oriansj>stikonas: very nice. There are also other savings possible for the bootstrap-seeds kaem-minimal (but not the stage0-posix kaem-minimal) with the disabling of verbose mode and turning off the error message of failure. (which means we can remove the fputc and file_print functions entirely) There are also optimizations with malloc. And if we don't need the \ behavior in the initial kaem.run we can take that out too. <oriansj>Te null values at the end can also be omitted (but you might want to keep the comments of their addresses) <oriansj>several of those tweaks will not work in stage0-posix because they are needed but in bootstrap-seeds, they probably will work just fine. <stikonas[m]>Yeah, we can have kaem-optional-seed building what is now currently kaem-minimal... <stikonas>oriansj: \ behavious is not needed in stage0-posix too, I already counted that in 51 byte... <stikonas>I didn't convert M1 to hex0 yet though... <stikonas>in this case it's mostly removing stuff, but addresses will change... <oriansj>stikonas: addresses can't change from M1 to hex0, unless you use a different ELF_header.hex2 file <oriansj>also Phase-5 and later do use \ for pretter lines <stikonas>oh yes, that's true... missed that in phase 5... <stikonas>M1 still has labels rather than hex jumps <oriansj>stikonas: when you remove lines in te hex0 the addresses change exactly as much as the same changes in M1 do <stikonas>although, if one does two stage kaem-minimal approach, then maybe it's simpler to just hardcode 3 commands needed to build kaem-minimal... <oriansj>as the instructions are the exact same between the two, M1 is just easier to write <oriansj>stikonas: definitely mentioned before but I never had the time to see if that was true <stikonas>but yes, it's kind of different program then <stikonas>so would have to be written from scratch <stikonas>might be easier to experiment with a new arch... e.g. riscv... <oriansj>well in theory it would be 3 array tables pointing to a block of strings and the current call block being called 3 times with those arrays <oriansj>more pointers to keep track of but should definitely be less instructions. <stikonas>rebuild hex0 and then directly build kaem-minimal ***janneke_ is now known as janneke
<stikonas>oriansj: since for riscv byte boundaries are meaningless maybe the following would work (at the expense of fairly different hex implementation): <stikonas>instead of processing two hex characters at a time (what toggle does) <stikonas>we process 6 fields separately (7, 5, 3, 5, 5, 7 bits, which can be represented by e.g. 2, 2, 1, 2, 2, 2 hex numbers, or we can do some other representation) <stikonas>and that should work with all opcodes (maybe with the exception of FENCE) <stikonas>hex1, hex2 will have to do somewhat different calculation to combine it into a single byte, but I think it wouldn't be too hard <stikonas>well, maybe should try writing some prototype in C... <stikonas>at least from the point of view of M1, it should be fairly trivial then <stikonas>hmm, although, that doesn't completely answer the question of immediate mangling in S and B-type opcodes <xentrac>stikonas: that sounds like a reasonable plan <stikonas>there is still some uncertainties but I think say it should be possible to implement those <stikonas>possibly need those arch specific prefixes specify how to encode some immediates... <stikonas>as not all can be inferred from the position <stikonas>maybe just have 7 different prefixes specifying which of the 7 encodings we use in that position <stikonas>but at least M1 and anything higher won't need any changes <stikonas>hex1, hex2, and hex2.c are the only things needing modification <stikonas>first two are the hardest to write though... <oriansj>well hex2 supports binary input (just 0 and 1) but the problem is that little endian architectures break up immediates that span the byte boundaries <oriansj>So it is best to go word oriented with RISC-V <stikonas>then we can have 7 different prefixes for each immediate encoding <oriansj>and convert !@$~ to mangle that word in RISC-V specific ways (or AArch64 specific ways if we want to packport that advance) <stikonas>well, it's probably simplest, .hex2 object files will grow by maybe 40% but at least it looks simple <oriansj>well we need only support the immediates in SB-Format and UJ-Format instructions <oriansj>well we have 4 characters to use so one for each format <stikonas>I was thinking being a bit more verbose but maybe tha'ts fine <oriansj>also we only need relative for jump and branch instructions <oriansj>M1 will deal with R-Format, I-Format, S-Format and S-Format