IRC channel logs

2021-09-18.log

back to list of logs

<oriansj>stikonas: indeed; less so than before but still a good bit because there are just not enough registers in x86
<stikonas>well, should be easier with amd64
<stikonas>it has more registers if I remember correctly
<stikonas>16?
<stikonas>for risc-v I'm actually struggling to use them all...
<stikonas>that's why I use some of the registers as global variables
<stikonas>on risc-v loading it from memory requires quite a few instructions... AUIPC followed by ADDI to get address of label and then LD
<xentrac>you do, yes
<xentrac>Berkeley RISC-I and -II and SPARC did this nifty register window thing where most of your registers are mapped into a processor-internal circular buffer
<xentrac>on SPARC you have 8 "global" registers which act like normal registers, 8 "out" registers, 8 "in" registers which are your caller's "out" registers, and 8 "local" registers which aren't shared with callers or callees
<xentrac>if the circular buffer fills up, a software fault handler copies a windowful of registers to RAM
<xentrac>pretty nifty idea: in the usual case doing a subroutine call and allocating a stack frame is just a single cycle, and deallocating it and returning is another single cycle
<xentrac>it imposed similar overhead on loading stuff from memory, though instead of AUIPC you would use SETHI, which set the high 16 bits of a register (so it wasn't PIC by default)
<xentrac>RISC-V abandoned the Berkeley RISC register window mechanism in favor of a more MIPS-like approach
<stikonas>there is definitely more than 2h of work in cc_* after tokenization is woring
<stikonas>working
<stikonas>might finish loops today but still need to do conditionals, local variables and expressions (which is the biggest part out of those)
<oriansj>well register windows don't present much advantage over pushing onto an L1 page and honestly the fixed register behavior of SPARC and the easier Berkeley RISC chips wasn't as flexiable as AMD's 29K series
<oriansj>stikonas: really? it just calling match with strings and calling emit with strings for everything but local and argument offset calculations. kinda tedious but copy paste should make quick work of it.
<stikonas>yeah, copy paste does help
<stikonas>late evening doesn't help though
<stikonas>it will be done this weekend anyway
<stikonas>so far things seem to be working...
<stikonas>that I implemented
<oriansj>fair exhustion can easily make this much much harder.
<oriansj>also if you haven't worked on the C version of cc_* yet, it is easy to miss the easier path of implementation
<stikonas>indeed, I didn't work on C version. And I don't even have GAS version, it's straight M1. But at least implementation order seems to be working well for me
<stikonas>I am able to test most of the stuff immediately
<stikonas>reverse instruction order also mixes up some statement strings...
<stikonas>but it's only minor annoyance...
<stikonas>e.g. "JUMP %FOR_" followed by ":FOR_THEN_" becomes "$FOR_" followed by "JAL\n:FOR_THEN_"
<oriansj>yep, the most annoying ordering details to work out in M1 output for cc_* and M2-Planet
<oriansj>best to have those sorted and tested to produce working output before trying to implement in assembly
<oriansj>as changing the order in assembly is much more messy than doing it in C
<stikonas>ok, conditionals are also working
<stikonas>probably enough for today
<xentrac>true that SPARC and Berkeley RISC weren't as *flexible* as the Am29K
<xentrac>but they were a lot *faster* and *smaller*
<oriansj>xentrac: a lot "faster" and "smaller" is hard to believe when compared against 3clock cycles and a 32bit instruction.
<oriansj>although delayed branches were such a bad idea in retrospect
<oriansj>also the 128 local + 64 global registers were a bit excessive even for optimizing compilers today.
<oriansj>minor correction 3 clock cycles and 3 32bit instructions. as even the documentation says "The"
<oriansj>function-call overhead in the 29K family consists of a small number of single-cycle instructions;
<stikonas>oriansj: is there any particular reason why M2-Planet and cc_* prototypes have different constants in collect_local
<stikonas>e.g. cc_aarch64 has a->depth = 64 in "main" section but "32" in M2-Planet
<stikonas>I guess M2-Planet numbers are the correct ones?
<oriansj>actually they are both correct because cc_aarch64 uses the stack but M2-Planet uses something more efficient.
<stikonas>oh I see
<stikonas>oh ok, so I need those larger numbers...
<oriansj>all pushes on the AArch64 stack is 128bits but we are only pushing 64bit registers, so we just use a different register and instruction sequence to get a more efficient result.
<stikonas>well, need to figure out what's correct for riscv but probably similar to aarch64
<oriansj>So you just need to know what calling convention you wish to implement
<stikonas>well, I guess I'll use stack too in cc_riscv64
<stikonas>and leave more efficient one for M2-Planet
<stikonas>stack should be simpler
<oriansj>stack is simplest for a C state machine but passing arguments in registers is simplest when writing assembly by hand.
<oriansj>also you need to be careful in the preservation order so that something like foo(bar(), 1); doesn't break your code
<oriansj>look carefully at the C versions because they'll help you avoid it.
<stikonas>yeah, I'm always looking at C versions
<stikonas>both cc_* and a bit at M2-Planet
<oriansj>the cc_* is much closer to what you'll be doing as wrongish but working is good enough to get a M2-Planet that can self-host
<stikonas>yeah, that's my thinking too...
<stikonas>just get bare minimum that is simplest
<stikonas>since cc_* only has to be able to build one single program
<oriansj>the big differences are about dealing with correct output behavior which need to be subtly checked.
<oriansj>structs and the matching of strings are the only things that absolutely have to be right in cc_*
<oriansj>being wasteful in the call stack is fine
<stikonas>crashing somewhere is not fine though... Time to debug and find where I messed up...
<stikonas>oriansj: https://github.com/oriansj/M2-Planet/blob/master/cc_core.c#L830 should be &postfix-expr ?
<stikonas>instead of postfix-expr
<stikonas>or am I misreading the code
<stikonas>(if yes, I can fix it later myself)
<oriansj>well line 830 is a comment
<oriansj>now if you are referring to common_recursion recieving a function pointer
<oriansj>we are just passing the function to be directly called. So passing the &label is exactly what is needed for the f(); call inside of it.
<stikonas>yes, I meant the comment
<stikonas>well, it looked to me like those are handled in primary_expr function here: \https://github.com/oriansj/M2-Planet/blob/master/cc_core.c#L1082
<stikonas>which handles & sizeof ! and ~
<oriansj>well the comments unfortunately are out of date and in some cases straight wrong. As they were originally to help me work out how to build the state machine to match the C precedence order
<stikonas>ok, I'll update them a bit...
<stikonas>I was just trying to see if I am misreading somerthing or not
<oriansj>basically the precedence is embedded in the call order.