<stikonas>hmm, encoding riscv jumps does not actually take THAT long... <stikonas>probably 3-4 minutes per jump do do all calculations and even produce nice intermediate comments <oriansj>just a thought but figuring out the simplest RISC-V hex2 subset might be easier if you tried adding RISC-V support to M2-Planet (and M2libc) then you'll be able to see what subset could be most easily used. Rather than rigidly following the hex0->hex1->hex2 plan <stikonas[m]>Possibly... But writing hex1_x86 is a useful exercise to understand some things for somebody with no riscv experience. And basically no non trivial assembly programming before <oriansj>hex0, hex1, hex2, M0 and cc_* are a wonderful progression for learning assembly language programming. <stikonas>I think I have ~10 instructions left to encode <stikonas>then we can put that file into RV64/Development/ <oriansj>I've been considering a minor change in hex2 that might make AArch64 and RISC-V slightly more efficient <stikonas>I've also adding more comments to .hex0 file in RISC-V since immediates are mangled <oriansj>but I still am figuring out the correct way to do it <stikonas>those comments should help with those J- and B- type instructions <stikonas>it's not too hard to encode them by hand if you have that bit template <stikonas>and since I had it, I though why not putting in the comments <stikonas>yeah, I saw Aarch64 also has some hex weirdness... <oriansj>also fossy you might want to cancel your mescc-tools draft pull request as we have created mescc-tools-extra <stikonas>I don't think fossy would mind given that we went slightly different direction <oriansj>I don't like the syntax choices but it is looking more and more that RISC-V is very byte hostile. <oriansj>also I would want to restrict that behavior heavily to just the most byte hostile architectures. <oriansj>So when one does hex1 and hex2 for RISC-V in assembly they might end up having to do something with that word based syntax to make it work. <oriansj>which is going to probably make that hex1 and hex2 much more complex than any other architectures hex1 and hex2 <stikonas>yeah, I also thought that we'll have to make hex1 and hex2 significantly more complex... <stikonas>although, need to think this through a bit to find lowest complexity... <stikonas>at least these kind of bit manipulations where immediates are swapped shouldn't be too hard to implement <stikonas>but we also need to take into account 7 bit opcodes... <stikonas>is that link work in progess for riscv in mes? <stikonas>(well, not mes itself but tools with the intention of later mes port) <oriansj>as M1 deals with immediate encoding as well as the conversion from human names to hex2 <oriansj>in x86 it is what converts add_eax %0x1234 into 05 3412 <oriansj>in RISC-V it would need to output the syntax that hex2 uses for the immediate field encodings <oriansj>So if the syntax is .J 34120000 then it would have to output that for %0x1234 or require a custom syntax there too <oriansj>stikonas: honestly I was thinking of repurposing !@#$ for RISC-V instead of the example syntax linked <oriansj>as individual bytes and halfs don't mean anything in RISC-V <oriansj>and ~ is already defined as architecture specific <stikonas>hmm, alternatively, can we have some kind of preprocessor? <stikonas>so we reorder crazy risc instruction boundaries into something that M1 can easily output <stikonas>hmm, although, that will probably make address calculations too hard <oriansj>M1 doesn't have to care about addresses at all, it is just a macro string processor <stikonas>well, I'll have to take a better look at M1 first before I can give more qualified input to this problem <oriansj>with a few architecture specific flagged behaviors (little/big endian) etc <oriansj>also I've been sticking to single character prefix special behaviors to simplify implementation in assembly. (:label instead of the more common label:) <stikonas>well, yes, although we have enough spare characters <stikonas>basically anything other than a-f A-F 0-9 : ; # <oriansj>I'd rather drop special immediate encodings than fill up the space if possible <oriansj>and !@$%~ 5 hex2 offset encodings and !@$%~ 5 M1 immediate encodings should be enough to do something useful <oriansj>if not, the architecture set would have to be absolute garbage <oriansj>we could even extend it to % and & if we expected someting stupid like %0x12345678 '00000000' <oriansj>but I'd prefer something clean if possible. <oriansj>and something like !00FF0000 @1000450 $0200006 00000000 might be needed to encode fixed position registers as seperate defines if one wanted to clean up the M1 syntax for RISC-V instead of doing the DEFINE ADD_EAX 05 that we did in x86 <oriansj>(with the first 3 just changing the contents of the shift register before being xor'd with the zeros to produce the one word of output) <stikonas>yeah, I think that x86 approach would be infeasible <stikonas>but yes, first appraoch will need more work in hex1/2 <oriansj>and we probably could reserve . to mean just xor what follows into the shift register. So that the output of M1 for the various !6 @7 and $0x123 statments just become .06000000, .00012000 and .0070340 <oriansj>hex2 would only need to know about SB-Format and UJ-Format <oriansj>unless there is something important I am missing <oriansj>M1 would need to support I-Format, S-Format and U-Format <stikonas>hmm, will we have any problem with pseudoinstructions? <oriansj>and R-Format would just be a standard DEFINE in M1 (unless we want to do something funky like DEFINE R1_12 .00001200 so that hex2 deals with putting register number 0x12 in register spot 1 <oriansj>also we don't deal with pseudoinstructions because they are not really implemented and instead we will deal with the actual instructions that the processor will actually be using. <oriansj>So it would be 32x3 DEFINEs for all register encoding details needed for RISC-V but would look a little odd => R0_12 R1_0 R2_31 ADD but it would work <oriansj>and R type instructions would use R0_*, R1_* and R2_* but I would only use R1_* and R2_*; S and SB would use R0_* and R1_* and U and UJ would use R2_* <oriansj>and hope to god the programmer can remember that *stikonas is trying to calculate jumps for final 7 instructions (loading :table, etc...) <oriansj>it is easiest if you put the (0xaddress) as a comment next to the :label <oriansj>that way you would only need to count the number of bytes between labels <stikonas>well, I had a table with number of instructions <stikonas>but in this case I'm trying to understand alignment rules <stikonas>it's the :table symbol at the end of program <stikonas>maybe anything will work as long as I'm consistent in all 7 instructions and leave enough space for alignment <oriansj>stikonas: well assuming RISC-V is stupidly forcing alignment on the assembler writer. Just assume align to 8bytes. (it'll probably work) <stikonas>well, I was thinking of 4 bytes but we'll see... <oriansj>if they were reasonable, align to byte would be the only restriction (allows dense encoding in the future) but I guess their simplify hardware implementation goal continues to cause us frustation. <stikonas>probably need to redo :table address calculations... <stikonas>I did use gdb for assembly prototype, but now there is no debugging info... <stikonas>well, I'll first retry some things with those :table jumps <stikonas>oriansj: ok, I think I've got it working <stikonas>was adjusting that jump amount, then spotted that I typed one instruction in reverse <stikonas>not sure if adjusting the jump amount was necessary... <stikonas>argh, yes, it was the wrong binary, sorry for the noise, going back to investigation ***ChanServ sets mode: +o janneke_
<mihi>Some time ago I also had some thoughts how you could shoehorn RISC-V style encodings into hex2 (without actually hardcoding them in the hex2 binary). But as I won't do the actual work, it does not matter how I would have done it :) <mihi>I would have added not one but two shift registers, an output shift register (gets XORed against every byte output) and an input shift register. Have a prefix char that shifts the next output (byte or address) into the ISR. <mihi>Then you would need the operations ISR &= const, ISR rotate_left const (0..63), OSR ^= ISR, and maybe OSR += ISR (two of them if both endiannesses). <mihi>In case M1 cannot easily emit variables multiple times, probably another accumulator register that can be copied to/from ISR <stikonas>hmm, debugging hex code is annoying... although I made some progress... <stikonas>from segfault after fixing some issues went to running but not doing anything, then after another fix segfault again :D, although, this time I can see it reading something successfully <ekaitz>stikonas: I debugged hex0 using gdb and disassembling everything so I could check the address of the jumps instead of counting instructions by hand <ekaitz>I'm an engineer so I don't know how to count more than 10 <xentrac>you might find radare2 useful for this kind of thing <xentrac>I mean I haven't used it myself, but youtube demos make it look like it would be pretty convenient <ekaitz>xentrac: isn't that more for reverse engineering? <stikonas>ekaitz: well, I'm thinking about running two gdb side by size <stikonas>one running hex0, the other running stuff compiled by as, so I can get line numbers <stikonas>I want to find instruction that segfaults <ekaitz>you can use breakpoints and so on, even if without the debug info <ekaitz>in the end, you are in assembly, there's no need to point to a source file or anything, you are on it! <ekaitz>layout asm is more than enough for this i'd say <xentrac>ekaitz: if you're disassembling things and trying to figure out where jumps go and stepping through machine code in gdb, you're pretty close to reverse engineering <stikonas>well, the thing is I don't need to figure out how things have to work, I already have a working assembly code, I just needed to find where hex0 code deviates from it <stikonas>I could look at just hex0 code and think whether register values make sense, but it's a bit easier to compare with known good. Not saying the other way can't be done