IRC channel logs

<stikonas>hmm, encoding riscv jumps does not actually take THAT long...

<stikonas>probably 3-4 minutes per jump do do all calculations and even produce nice intermediate comments

<oriansj>just a thought but figuring out the simplest RISC-V hex2 subset might be easier if you tried adding RISC-V support to M2-Planet (and M2libc) then you'll be able to see what subset could be most easily used. Rather than rigidly following the hex0->hex1->hex2 plan

<stikonas[m]>Possibly... But writing hex1_x86 is a useful exercise to understand some things for somebody with no riscv experience. And basically no non trivial assembly programming before

<oriansj>stikonas: you are absolutely right.

<stikonas>we'll see soon enough if it works

<oriansj>hex0, hex1, hex2, M0 and cc_* are a wonderful progression for learning assembly language programming.

<stikonas>I think I have ~10 instructions left to encode

<stikonas>then we can put that file into RV64/Development/

<oriansj>I've been considering a minor change in hex2 that might make AArch64 and RISC-V slightly more efficient

<stikonas>?

<stikonas>I've also adding more comments to .hex0 file in RISC-V since immediates are mangled

<oriansj>2 classes of hex2: byte and word

<oriansj>but I still am figuring out the correct way to do it

<stikonas>WIP: https://stikonas.eu/files/bootstrap/hex1_x86.hex0.txt

<oriansj>stikonas: nice

<stikonas>those comments should help with those J- and B- type instructions

<stikonas>it's not too hard to encode them by hand if you have that bit template

<stikonas>and since I had it, I though why not putting in the comments

<stikonas>yeah, I saw Aarch64 also has some hex weirdness...

<oriansj>also fossy you might want to cancel your mescc-tools draft pull request as we have created mescc-tools-extra

<stikonas>can you not close it without fossy?

<stikonas>I don't think fossy would mind given that we went slightly different direction

<oriansj>I got the idea from: https://github.com/laanwj/guix-mescc-tools/commit/1634c35635836943a4164fedf575c6ffe38f72ee

<oriansj>I don't like the syntax choices but it is looking more and more that RISC-V is very byte hostile.

<oriansj>also I would want to restrict that behavior heavily to just the most byte hostile architectures.

<oriansj>So when one does hex1 and hex2 for RISC-V in assembly they might end up having to do something with that word based syntax to make it work.

<oriansj>which is going to probably make that hex1 and hex2 much more complex than any other architectures hex1 and hex2

<stikonas>yeah, I also thought that we'll have to make hex1 and hex2 significantly more complex...

<stikonas>although, need to think this through a bit to find lowest complexity...

<stikonas>at least these kind of bit manipulations where immediates are swapped shouldn't be too hard to implement

<stikonas>but we also need to take into account 7 bit opcodes...

<stikonas>is that link work in progess for riscv in mes?

<stikonas>(well, not mes itself but tools with the intention of later mes port)

<oriansj>stikonas: as far as I can tell yes

<oriansj>but it brings a problem for M1

<oriansj>as M1 deals with immediate encoding as well as the conversion from human names to hex2

<oriansj>in x86 it is what converts add_eax %0x1234 into 05 3412

<oriansj>in RISC-V it would need to output the syntax that hex2 uses for the immediate field encodings

<oriansj>So if the syntax is .J 34120000 then it would have to output that for %0x1234 or require a custom syntax there too

<stikonas>.J meaning jump type opcode?

<oriansj>stikonas: honestly I was thinking of repurposing !@#$ for RISC-V instead of the example syntax linked

<oriansj>as individual bytes and halfs don't mean anything in RISC-V

<oriansj>and ~ is already defined as architecture specific

<stikonas>hmm, alternatively, can we have some kind of preprocessor?

<stikonas>so we reorder crazy risc instruction boundaries into something that M1 can easily output

<stikonas>hmm, although, that will probably make address calculations too hard

<oriansj>M1 doesn't have to care about addresses at all, it is just a macro string processor

<stikonas>well, I'll have to take a better look at M1 first before I can give more qualified input to this problem

<oriansj>with a few architecture specific flagged behaviors (little/big endian) etc

<oriansj>also I've been sticking to single character prefix special behaviors to simplify implementation in assembly. (:label instead of the more common label:)

<stikonas>well, yes, although we have enough spare characters

<stikonas>or even letters...

<stikonas>basically anything other than a-f A-F 0-9 : ; #

<oriansj>I'd rather drop special immediate encodings than fill up the space if possible

<oriansj>and !@$%~ 5 hex2 offset encodings and !@$%~ 5 M1 immediate encodings should be enough to do something useful

<oriansj>if not, the architecture set would have to be absolute garbage

<oriansj>we could even extend it to % and & if we expected someting stupid like %0x12345678 '00000000'

<oriansj>but I'd prefer something clean if possible.

<oriansj>and something like !00FF0000 @1000450 $0200006 00000000 might be needed to encode fixed position registers as seperate defines if one wanted to clean up the M1 syntax for RISC-V instead of doing the DEFINE ADD_EAX 05 that we did in x86

<oriansj>(with the first 3 just changing the contents of the shift register before being xor'd with the zeros to produce the one word of output)

<stikonas>yeah, I think that x86 approach would be infeasible

<stikonas>but yes, first appraoch will need more work in hex1/2

<oriansj>and we probably could reserve . to mean just xor what follows into the shift register. So that the output of M1 for the various !6 @7 and $0x123 statments just become .06000000, .00012000 and .0070340

<oriansj>(these are just arbitrary examples)

<oriansj>hex2 would only need to know about SB-Format and UJ-Format

<oriansj>which we could do as ~ and ^~

<oriansj>unless there is something important I am missing

<oriansj>M1 would need to support I-Format, S-Format and U-Format

<stikonas>hmm, will we have any problem with pseudoinstructions?

<stikonas>some of that seems non-trivial too

<stikonas>yeah, the the formats are not too hard

<oriansj>and R-Format would just be a standard DEFINE in M1 (unless we want to do something funky like DEFINE R1_12 .00001200 so that hex2 deals with putting register number 0x12 in register spot 1

<oriansj>also we don't deal with pseudoinstructions because they are not really implemented and instead we will deal with the actual instructions that the processor will actually be using.

<oriansj>So it would be 32x3 DEFINEs for all register encoding details needed for RISC-V but would look a little odd => R0_12 R1_0 R2_31 ADD but it would work

<oriansj>and R type instructions would use R0_*, R1_* and R2_* but I would only use R1_* and R2_*; S and SB would use R0_* and R1_* and U and UJ would use R2_*

<oriansj>and hope to god the programmer can remember that

*stikonas is trying to calculate jumps for final 7 instructions (loading :table, etc...)

<oriansj>it is easiest if you put the (0xaddress) as a comment next to the :label

<oriansj>that way you would only need to count the number of bytes between labels

<stikonas>well, I had a table with number of instructions

<stikonas>but in this case I'm trying to understand alignment rules

<stikonas>it's the :table symbol at the end of program

<stikonas>maybe anything will work as long as I'm consistent in all 7 instructions and leave enough space for alignment

<oriansj>stikonas: well assuming RISC-V is stupidly forcing alignment on the assembler writer. Just assume align to 8bytes. (it'll probably work)

<stikonas>well, I was thinking of 4 bytes but we'll see...

<stikonas>if 4 don't work, I'll change to 8

<oriansj>if they were reasonable, align to byte would be the only restriction (allows dense encoding in the future) but I guess their simplify hardware implementation goal continues to cause us frustation.

<stikonas>hmm, segfault :(

<stikonas>probably need to redo :table address calculations...

<oriansj>gdb si will probably help a lot

<stikonas>I did use gdb for assembly prototype, but now there is no debugging info...

<stikonas>well, I'll first retry some things with those :table jumps

<stikonas>oriansj: ok, I think I've got it working

<stikonas>was adjusting that jump amount, then spotted that I typed one instruction in reverse

<stikonas>so that fixed it

<stikonas>not sure if adjusting the jump amount was necessary...

<stikonas>hmm, or did I run something else...

<stikonas>hmm

<stikonas>argh, yes, it was the wrong binary, sorry for the noise, going back to investigation

***ChanServ sets mode: +o janneke_

<mihi>Some time ago I also had some thoughts how you could shoehorn RISC-V style encodings into hex2 (without actually hardcoding them in the hex2 binary). But as I won't do the actual work, it does not matter how I would have done it :)

<mihi>I would have added not one but two shift registers, an output shift register (gets XORed against every byte output) and an input shift register. Have a prefix char that shifts the next output (byte or address) into the ISR.

<mihi>Then you would need the operations ISR &= const, ISR rotate_left const (0..63), OSR ^= ISR, and maybe OSR += ISR (two of them if both endiannesses).

<mihi>In case M1 cannot easily emit variables multiple times, probably another accumulator register that can be copied to/from ISR

<stikonas>hmm, debugging hex code is annoying... although I made some progress...

<stikonas>from segfault after fixing some issues went to running but not doing anything, then after another fix segfault again :D, although, this time I can see it reading something successfully

<ekaitz>stikonas: I debugged hex0 using gdb and disassembling everything so I could check the address of the jumps instead of counting instructions by hand

<ekaitz>I'm an engineer so I don't know how to count more than 10

<xentrac>you might find radare2 useful for this kind of thing

<xentrac>I mean I haven't used it myself, but youtube demos make it look like it would be pretty convenient

<ekaitz>xentrac: isn't that more for reverse engineering?

<stikonas>ekaitz: well, I'm thinking about running two gdb side by size

<ekaitz>why two?

<stikonas>one running hex0, the other running stuff compiled by as, so I can get line numbers

<stikonas>hex0 has no debug info

<ekaitz>and why do you need it?

<stikonas>well, easier than address jumps...

<stikonas>I want to find instruction that segfaults

<stikonas>anyway, I'm making some progress

<stikonas>fixed a few issues

<ekaitz>you can use breakpoints and so on, even if without the debug info

<ekaitz>in the end, you are in assembly, there's no need to point to a source file or anything, you are on it!

<ekaitz>layout asm is more than enough for this i'd say

<xentrac>ekaitz: if you're disassembling things and trying to figure out where jumps go and stepping through machine code in gdb, you're pretty close to reverse engineering

<stikonas>well, the thing is I don't need to figure out how things have to work, I already have a working assembly code, I just needed to find where hex0 code deviates from it

<stikonas>I could look at just hex0 code and think whether register values make sense, but it's a bit easier to compare with known good. Not saying the other way can't be done

<stikonas>but ok, layout asm is useful...

IRC channel logs

2021-07-24.log