IRC channel logs

2021-07-25.log

back to list of logs

<stikonas>hmm, I think hex1_x86 is now working...
<stikonas>Although somewhat slowly
<stikonas>probably some alignment is messed up but will do for now
<Hagfish>that sounds great
<stikonas>oriansj: https://github.com/oriansj/stage0-posix/pull/29
<stikonas>maybe I should have started with kaem...
<stikonas>but now with some experience I can probably deal with kaem, that one shouldn't have problems due to risc-v encodings
<xentrac>yaay
<stikonas>xentrac: well, it's not good enough for risc-v native hex1...
<xentrac>it's progress!
<Hagfish>the syntax is very alien to me (which makes it all the more impressive), but the layout and commenting/naming makes it quite reasonable to follow
<Hagfish>i was a little uncertain whether "# Get number of the args" should be "# Get the number of args", but that could be because i don't understand what the code is doing there
<stikonas>that comment is copy/pasted...
<Hagfish>i'm sure that other people won't be as confused as me by it :)
<stikonas>Hagfish: well, syntax is basically: leftmost is machine code (those hex numbers), then the next column is assembly instructions and finally human readable comments
<Hagfish>yeah, that's probably quite conventional
<Hagfish>but it feels like, to write that sort of code, you have to be pretty fluent in machine code
<stikonas>Hagfish: not really... I literally knew nothing about RISC-V or any other machine code a week ago
<Hagfish>woah
<Hagfish>seriously, you'd never written a line of machine code before?
<stikonas>now I've read a few pages of RISC-V ISA documentation (out of 100 pages)...
<stikonas>no, and even fairly little assembly experience
<stikonas>only some trivial things
<Hagfish>that's some impressive adaptability
<stikonas>maybe. Or maybe you overestimate how difficult these things are
<Hagfish>that's possible, yes
<stikonas>they certainly sound scary
<stikonas>and writing assembly is not that different from certain subset of C...
<Hagfish>i keep thinking that this bootstrapping project really needs a dozen more experts, each with decades of experience in their specialisms, but maybe there are other ways to attack the problem
<stikonas>basically you should not use functions and use goto everywhere
<Hagfish>i suppose that's not so strange once you get used to it, but i think some people would find it easier than others
<xentrac>I don't think assembly is hard. assembly is mostly simple. machine code is mostly simple, although amd64 has decades of accumulated fungus
<stikonas>yeah, and risc-v doesn't have that. It has other annoying things though as we all discussed
<xentrac>what's hard is *debugging* a large assembly program
<stikonas>yeah, that's true...
<xentrac>also I mean writing things in assembly takes longer
<stikonas>well, having a simple C prototype might help
<xentrac>a lot of things are 100 lines of Python, 400 lines of C, or 2000 lines of assembly
<xentrac>not all things! a picture is worth a thousand words, but hardly any 1000-word essays can be adequately replaced by a picture
<stikonas>hex1.c is certainly not that much shorter than assembly version: https://github.com/oriansj/stage0-posix/blob/master/High%20Level%20Prototypes/hex1.c
<xentrac>similarly it's quite common to have 2000 lines of assembly that would be 1500 lines of C or 1000 lines of Python
<stikonas>yeah, it's close to latter here
<xentrac>one of the first algorithms in Knuth is a linear-time topological sort
<xentrac>I translated it from MIX assembly language to Python and it... was the same size
<xentrac>but there are other times where you're like
<xentrac> for y in range(min_y, max_y if max_y is not None else layout.height):
<xentrac> stream.writelines(layout.scan(y, w if w is not None else layout.width))
<xentrac>or
<xentrac> left = (('Chapters' | String('')) - hr -
<xentrac> (chapno | vr | [t | ~String(' . ') for t in titles]))
<stikonas>yeah, that would be nasty in assembly
<stikonas>fortunately, none of the bootstrap code is this complicated
<xentrac>I think in assembly you'd take a different approach
<stikonas>I don't think we ever use kaem " escaping in kaem-minimal... that's probably some opportunity to reduce its size...
<oriansj>stikonas[m]: you are probably quite right about "raw strings" in kaem-minimal but I did design it before live-bootstrap existed and didn't want to shoe-horn it in later as hex0 programming is a huge pain in the butt.
<oriansj>which is why figuring out the correct RISC-V hex2 syntax is probably the best way to figure out the correct way to implement hex1
<oriansj>and stikonas[m] your pull request to stage0-posix has been merged.
<oriansj>also one stupid trick for writing hex0 programs: write them in M1 first. (it enables using blood-elf to make the binaries debuggable and thanks to 'raw hex strings' you can convert one line at a time once it works and then just pull out the '/'' and you have your hex0 program)
<siraben>Melg8[m]: how can I use https://github.com/melg8/cit/blob/feature/BootstrapNix/bootstrap_nix/bootstrap_seeds/default.nix ?
<siraben>Is it possible to get a Nix shell with the bootstrapped GCC?
<siraben>I get `error: unsupported Git input attribute 'name'` when I run `nix-build`
<stikonas>so removing unnecessary features from tokenization in kaem-optional-seed can save at least 29 bytes
<stikonas>actually a bit more since I have not added jump addresses to that number
<stikonas>it might even be enough to push hex0 + kaem-optional-seed to sub 1KB size
<stikonas>ok, it can go down from 737 bytes to at least 686
<stikonas>so at least 51 byte saving
<stikonas>(on x86)
<oriansj>stikonas: very nice. There are also other savings possible for the bootstrap-seeds kaem-minimal (but not the stage0-posix kaem-minimal) with the disabling of verbose mode and turning off the error message of failure. (which means we can remove the fputc and file_print functions entirely) There are also optimizations with malloc. And if we don't need the \ behavior in the initial kaem.run we can take that out too.
<oriansj>Te null values at the end can also be omitted (but you might want to keep the comments of their addresses)
<oriansj>several of those tweaks will not work in stage0-posix because they are needed but in bootstrap-seeds, they probably will work just fine.
<stikonas[m]>Yeah, we can have kaem-optional-seed building what is now currently kaem-minimal...
<stikonas>oriansj: \ behavious is not needed in stage0-posix too, I already counted that in 51 byte...
<stikonas>I didn't convert M1 to hex0 yet though...
<stikonas>which is always the annoying part...
<stikonas>in this case it's mostly removing stuff, but addresses will change...
<oriansj>stikonas: addresses can't change from M1 to hex0, unless you use a different ELF_header.hex2 file
<oriansj>also Phase-5 and later do use \ for pretter lines
<stikonas>oh yes, that's true... missed that in phase 5...
<stikonas>oriansj: doesn't jump amount change?
<stikonas>if you remove lines
<stikonas>M1 still has labels rather than hex jumps
<oriansj>stikonas: when you remove lines in te hex0 the addresses change exactly as much as the same changes in M1 do
<stikonas>although, if one does two stage kaem-minimal approach, then maybe it's simpler to just hardcode 3 commands needed to build kaem-minimal...
<oriansj>as the instructions are the exact same between the two, M1 is just easier to write
<oriansj>stikonas: definitely mentioned before but I never had the time to see if that was true
<stikonas>yeah, I think I mentioned that...
<stikonas>but yes, it's kind of different program then
<stikonas>so would have to be written from scratch
<stikonas>might be easier to experiment with a new arch... e.g. riscv...
<oriansj>well in theory it would be 3 array tables pointing to a block of strings and the current call block being called 3 times with those arrays
<oriansj>more pointers to keep track of but should definitely be less instructions.
<stikonas>actually, maybe just 2 commands
<stikonas>rebuild hex0 and then directly build kaem-minimal
<stikonas>well, ok, also need to run it
<stikonas>so 3...
<oriansj>usually it is 0, 1 or infinity
***janneke_ is now known as janneke
<stikonas>oriansj: since for riscv byte boundaries are meaningless maybe the following would work (at the expense of fairly different hex implementation):
<stikonas>instead of processing two hex characters at a time (what toggle does)
<stikonas>we process 6 fields separately (7, 5, 3, 5, 5, 7 bits, which can be represented by e.g. 2, 2, 1, 2, 2, 2 hex numbers, or we can do some other representation)
<stikonas>and that should work with all opcodes (maybe with the exception of FENCE)
<stikonas>hex1, hex2 will have to do somewhat different calculation to combine it into a single byte, but I think it wouldn't be too hard
<stikonas>actually into 4 bytes
<stikonas>well, maybe should try writing some prototype in C...
<stikonas>at least from the point of view of M1, it should be fairly trivial then
<stikonas>hmm, although, that doesn't completely answer the question of immediate mangling in S and B-type opcodes
<xentrac>stikonas: that sounds like a reasonable plan
<stikonas>there is still some uncertainties but I think say it should be possible to implement those
<stikonas>possibly need those arch specific prefixes specify how to encode some immediates...
<stikonas>as not all can be inferred from the position
<xentrac>yeah
<stikonas>maybe just have 7 different prefixes specifying which of the 7 encodings we use in that position
<stikonas>but at least M1 and anything higher won't need any changes
<stikonas>hex1, hex2, and hex2.c are the only things needing modification
<stikonas>first two are the hardest to write though...
<oriansj>well hex2 supports binary input (just 0 and 1) but the problem is that little endian architectures break up immediates that span the byte boundaries
<stikonas>oh yes...
<oriansj>So it is best to go word oriented with RISC-V
<stikonas>so read the whole instruction?
<stikonas>well, that's what I was thinking too...
<stikonas>then we can have 7 different prefixes for each immediate encoding
<oriansj>and convert !@$~ to mangle that word in RISC-V specific ways (or AArch64 specific ways if we want to packport that advance)
<stikonas>yes...
<stikonas>well, it's probably simplest, .hex2 object files will grow by maybe 40% but at least it looks simple
<oriansj>well we need only support the immediates in SB-Format and UJ-Format instructions
<oriansj>(in hex2)
<stikonas>and I?
<stikonas>well, it's not mangled
<oriansj>well we have 4 characters to use so one for each format
<stikonas>oh ok...
<stikonas>I was thinking being a bit more verbose but maybe tha'ts fine
<oriansj>% and & will remain unchanged
<stikonas>ok, then & will deal with I
<oriansj>also we only need relative for jump and branch instructions
<xentrac>also auipc
<stikonas>auipc is U type
<oriansj>M1 will deal with R-Format, I-Format, S-Format and S-Format
<oriansj>correction U-Format