IRC channel logs

2021-07-24.log

back to list of logs

<stikonas>hmm, encoding riscv jumps does not actually take THAT long...
<stikonas>probably 3-4 minutes per jump do do all calculations and even produce nice intermediate comments
<oriansj>just a thought but figuring out the simplest RISC-V hex2 subset might be easier if you tried adding RISC-V support to M2-Planet (and M2libc) then you'll be able to see what subset could be most easily used. Rather than rigidly following the hex0->hex1->hex2 plan
<stikonas[m]>Possibly... But writing hex1_x86 is a useful exercise to understand some things for somebody with no riscv experience. And basically no non trivial assembly programming before
<oriansj>stikonas: you are absolutely right.
<stikonas>we'll see soon enough if it works
<oriansj>hex0, hex1, hex2, M0 and cc_* are a wonderful progression for learning assembly language programming.
<stikonas>I think I have ~10 instructions left to encode
<stikonas>then we can put that file into RV64/Development/
<oriansj>I've been considering a minor change in hex2 that might make AArch64 and RISC-V slightly more efficient
<stikonas>?
<stikonas>I've also adding more comments to .hex0 file in RISC-V since immediates are mangled
<oriansj>2 classes of hex2: byte and word
<oriansj>but I still am figuring out the correct way to do it
<stikonas>WIP: https://stikonas.eu/files/bootstrap/hex1_x86.hex0.txt
<oriansj>stikonas: nice
<stikonas>those comments should help with those J- and B- type instructions
<stikonas>it's not too hard to encode them by hand if you have that bit template
<stikonas>and since I had it, I though why not putting in the comments
<stikonas>yeah, I saw Aarch64 also has some hex weirdness...
<oriansj>also fossy you might want to cancel your mescc-tools draft pull request as we have created mescc-tools-extra
<stikonas>can you not close it without fossy?
<stikonas>I don't think fossy would mind given that we went slightly different direction
<oriansj>I got the idea from: https://github.com/laanwj/guix-mescc-tools/commit/1634c35635836943a4164fedf575c6ffe38f72ee
<oriansj>I don't like the syntax choices but it is looking more and more that RISC-V is very byte hostile.
<oriansj>also I would want to restrict that behavior heavily to just the most byte hostile architectures.
<oriansj>So when one does hex1 and hex2 for RISC-V in assembly they might end up having to do something with that word based syntax to make it work.
<oriansj>which is going to probably make that hex1 and hex2 much more complex than any other architectures hex1 and hex2
<stikonas>yeah, I also thought that we'll have to make hex1 and hex2 significantly more complex...
<stikonas>although, need to think this through a bit to find lowest complexity...
<stikonas>at least these kind of bit manipulations where immediates are swapped shouldn't be too hard to implement
<stikonas>but we also need to take into account 7 bit opcodes...
<stikonas>is that link work in progess for riscv in mes?
<stikonas>(well, not mes itself but tools with the intention of later mes port)
<oriansj>stikonas: as far as I can tell yes
<oriansj>but it brings a problem for M1
<oriansj>as M1 deals with immediate encoding as well as the conversion from human names to hex2
<oriansj>in x86 it is what converts add_eax %0x1234 into 05 3412
<oriansj>in RISC-V it would need to output the syntax that hex2 uses for the immediate field encodings
<oriansj>So if the syntax is .J 34120000 then it would have to output that for %0x1234 or require a custom syntax there too
<stikonas>.J meaning jump type opcode?
<oriansj>stikonas: honestly I was thinking of repurposing !@#$ for RISC-V instead of the example syntax linked
<oriansj>as individual bytes and halfs don't mean anything in RISC-V
<oriansj>and ~ is already defined as architecture specific
<stikonas>hmm, alternatively, can we have some kind of preprocessor?
<stikonas>so we reorder crazy risc instruction boundaries into something that M1 can easily output
<stikonas>hmm, although, that will probably make address calculations too hard
<oriansj>M1 doesn't have to care about addresses at all, it is just a macro string processor
<stikonas>well, I'll have to take a better look at M1 first before I can give more qualified input to this problem
<oriansj>with a few architecture specific flagged behaviors (little/big endian) etc
<oriansj>also I've been sticking to single character prefix special behaviors to simplify implementation in assembly. (:label instead of the more common label:)
<stikonas>well, yes, although we have enough spare characters
<stikonas>or even letters...
<stikonas>basically anything other than a-f A-F 0-9 : ; #
<oriansj>I'd rather drop special immediate encodings than fill up the space if possible
<oriansj>and !@$%~ 5 hex2 offset encodings and !@$%~ 5 M1 immediate encodings should be enough to do something useful
<oriansj>if not, the architecture set would have to be absolute garbage
<oriansj>we could even extend it to % and & if we expected someting stupid like %0x12345678 '00000000'
<oriansj>but I'd prefer something clean if possible.
<oriansj>and something like !00FF0000 @1000450 $0200006 00000000 might be needed to encode fixed position registers as seperate defines if one wanted to clean up the M1 syntax for RISC-V instead of doing the DEFINE ADD_EAX 05 that we did in x86
<oriansj>(with the first 3 just changing the contents of the shift register before being xor'd with the zeros to produce the one word of output)
<stikonas>yeah, I think that x86 approach would be infeasible
<stikonas>but yes, first appraoch will need more work in hex1/2
<oriansj>and we probably could reserve . to mean just xor what follows into the shift register. So that the output of M1 for the various !6 @7 and $0x123 statments just become .06000000, .00012000 and .0070340
<oriansj>(these are just arbitrary examples)
<oriansj>hex2 would only need to know about SB-Format and UJ-Format
<oriansj>which we could do as ~ and ^~
<oriansj>unless there is something important I am missing
<oriansj>M1 would need to support I-Format, S-Format and U-Format
<stikonas>hmm, will we have any problem with pseudoinstructions?
<stikonas>some of that seems non-trivial too
<stikonas>yeah, the the formats are not too hard
<oriansj>and R-Format would just be a standard DEFINE in M1 (unless we want to do something funky like DEFINE R1_12 .00001200 so that hex2 deals with putting register number 0x12 in register spot 1
<oriansj>also we don't deal with pseudoinstructions because they are not really implemented and instead we will deal with the actual instructions that the processor will actually be using.
<oriansj>So it would be 32x3 DEFINEs for all register encoding details needed for RISC-V but would look a little odd => R0_12 R1_0 R2_31 ADD but it would work
<oriansj>and R type instructions would use R0_*, R1_* and R2_* but I would only use R1_* and R2_*; S and SB would use R0_* and R1_* and U and UJ would use R2_*
<oriansj>and hope to god the programmer can remember that
*stikonas is trying to calculate jumps for final 7 instructions (loading :table, etc...)
<oriansj>it is easiest if you put the (0xaddress) as a comment next to the :label
<oriansj>that way you would only need to count the number of bytes between labels
<stikonas>well, I had a table with number of instructions
<stikonas>but in this case I'm trying to understand alignment rules
<stikonas>it's the :table symbol at the end of program
<stikonas>maybe anything will work as long as I'm consistent in all 7 instructions and leave enough space for alignment
<oriansj>stikonas: well assuming RISC-V is stupidly forcing alignment on the assembler writer. Just assume align to 8bytes. (it'll probably work)
<stikonas>well, I was thinking of 4 bytes but we'll see...
<stikonas>if 4 don't work, I'll change to 8
<oriansj>if they were reasonable, align to byte would be the only restriction (allows dense encoding in the future) but I guess their simplify hardware implementation goal continues to cause us frustation.
<stikonas>hmm, segfault :(
<stikonas>probably need to redo :table address calculations...
<oriansj>gdb si will probably help a lot
<stikonas>I did use gdb for assembly prototype, but now there is no debugging info...
<stikonas>well, I'll first retry some things with those :table jumps
<stikonas>oriansj: ok, I think I've got it working
<stikonas>was adjusting that jump amount, then spotted that I typed one instruction in reverse
<stikonas>so that fixed it
<stikonas>not sure if adjusting the jump amount was necessary...
<stikonas>hmm, or did I run something else...
<stikonas>hmm
<stikonas>argh, yes, it was the wrong binary, sorry for the noise, going back to investigation
***ChanServ sets mode: +o janneke_
<mihi>Some time ago I also had some thoughts how you could shoehorn RISC-V style encodings into hex2 (without actually hardcoding them in the hex2 binary). But as I won't do the actual work, it does not matter how I would have done it :)
<mihi>I would have added not one but two shift registers, an output shift register (gets XORed against every byte output) and an input shift register. Have a prefix char that shifts the next output (byte or address) into the ISR.
<mihi>Then you would need the operations ISR &= const, ISR rotate_left const (0..63), OSR ^= ISR, and maybe OSR += ISR (two of them if both endiannesses).
<mihi>In case M1 cannot easily emit variables multiple times, probably another accumulator register that can be copied to/from ISR
<stikonas>hmm, debugging hex code is annoying... although I made some progress...
<stikonas>from segfault after fixing some issues went to running but not doing anything, then after another fix segfault again :D, although, this time I can see it reading something successfully
<ekaitz>stikonas: I debugged hex0 using gdb and disassembling everything so I could check the address of the jumps instead of counting instructions by hand
<ekaitz>I'm an engineer so I don't know how to count more than 10
<xentrac>you might find radare2 useful for this kind of thing
<xentrac>I mean I haven't used it myself, but youtube demos make it look like it would be pretty convenient
<ekaitz>xentrac: isn't that more for reverse engineering?
<stikonas>ekaitz: well, I'm thinking about running two gdb side by size
<ekaitz>why two?
<stikonas>one running hex0, the other running stuff compiled by as, so I can get line numbers
<stikonas>hex0 has no debug info
<ekaitz>and why do you need it?
<stikonas>well, easier than address jumps...
<stikonas>I want to find instruction that segfaults
<stikonas>anyway, I'm making some progress
<stikonas>fixed a few issues
<ekaitz>you can use breakpoints and so on, even if without the debug info
<ekaitz>in the end, you are in assembly, there's no need to point to a source file or anything, you are on it!
<ekaitz>layout asm is more than enough for this i'd say
<xentrac>ekaitz: if you're disassembling things and trying to figure out where jumps go and stepping through machine code in gdb, you're pretty close to reverse engineering
<stikonas>well, the thing is I don't need to figure out how things have to work, I already have a working assembly code, I just needed to find where hex0 code deviates from it
<stikonas>I could look at just hex0 code and think whether register values make sense, but it's a bit easier to compare with known good. Not saying the other way can't be done
<stikonas>but ok, layout asm is useful...