IRC channel logs
2021-09-18.log
back to list of logs
<oriansj>stikonas: indeed; less so than before but still a good bit because there are just not enough registers in x86 <stikonas>it has more registers if I remember correctly <stikonas>for risc-v I'm actually struggling to use them all... <stikonas>that's why I use some of the registers as global variables <stikonas>on risc-v loading it from memory requires quite a few instructions... AUIPC followed by ADDI to get address of label and then LD <xentrac>Berkeley RISC-I and -II and SPARC did this nifty register window thing where most of your registers are mapped into a processor-internal circular buffer <xentrac>on SPARC you have 8 "global" registers which act like normal registers, 8 "out" registers, 8 "in" registers which are your caller's "out" registers, and 8 "local" registers which aren't shared with callers or callees <xentrac>if the circular buffer fills up, a software fault handler copies a windowful of registers to RAM <xentrac>pretty nifty idea: in the usual case doing a subroutine call and allocating a stack frame is just a single cycle, and deallocating it and returning is another single cycle <xentrac>it imposed similar overhead on loading stuff from memory, though instead of AUIPC you would use SETHI, which set the high 16 bits of a register (so it wasn't PIC by default) <xentrac>RISC-V abandoned the Berkeley RISC register window mechanism in favor of a more MIPS-like approach <stikonas>there is definitely more than 2h of work in cc_* after tokenization is woring <stikonas>might finish loops today but still need to do conditionals, local variables and expressions (which is the biggest part out of those) <oriansj>well register windows don't present much advantage over pushing onto an L1 page and honestly the fixed register behavior of SPARC and the easier Berkeley RISC chips wasn't as flexiable as AMD's 29K series <oriansj>stikonas: really? it just calling match with strings and calling emit with strings for everything but local and argument offset calculations. kinda tedious but copy paste should make quick work of it. <oriansj>fair exhustion can easily make this much much harder. <oriansj>also if you haven't worked on the C version of cc_* yet, it is easy to miss the easier path of implementation <stikonas>indeed, I didn't work on C version. And I don't even have GAS version, it's straight M1. But at least implementation order seems to be working well for me <stikonas>I am able to test most of the stuff immediately <stikonas>reverse instruction order also mixes up some statement strings... <stikonas>e.g. "JUMP %FOR_" followed by ":FOR_THEN_" becomes "$FOR_" followed by "JAL\n:FOR_THEN_" <oriansj>yep, the most annoying ordering details to work out in M1 output for cc_* and M2-Planet <oriansj>best to have those sorted and tested to produce working output before trying to implement in assembly <oriansj>as changing the order in assembly is much more messy than doing it in C <xentrac>true that SPARC and Berkeley RISC weren't as *flexible* as the Am29K <xentrac>but they were a lot *faster* and *smaller* <oriansj>xentrac: a lot "faster" and "smaller" is hard to believe when compared against 3clock cycles and a 32bit instruction. <oriansj>although delayed branches were such a bad idea in retrospect <oriansj>also the 128 local + 64 global registers were a bit excessive even for optimizing compilers today. <oriansj>minor correction 3 clock cycles and 3 32bit instructions. as even the documentation says "The" <oriansj>function-call overhead in the 29K family consists of a small number of single-cycle instructions; <stikonas>oriansj: is there any particular reason why M2-Planet and cc_* prototypes have different constants in collect_local <stikonas>e.g. cc_aarch64 has a->depth = 64 in "main" section but "32" in M2-Planet <stikonas>I guess M2-Planet numbers are the correct ones? <oriansj>actually they are both correct because cc_aarch64 uses the stack but M2-Planet uses something more efficient. <stikonas>oh ok, so I need those larger numbers... <oriansj>all pushes on the AArch64 stack is 128bits but we are only pushing 64bit registers, so we just use a different register and instruction sequence to get a more efficient result. <stikonas>well, need to figure out what's correct for riscv but probably similar to aarch64 <oriansj>So you just need to know what calling convention you wish to implement <stikonas>well, I guess I'll use stack too in cc_riscv64 <stikonas>and leave more efficient one for M2-Planet <oriansj>stack is simplest for a C state machine but passing arguments in registers is simplest when writing assembly by hand. <oriansj>also you need to be careful in the preservation order so that something like foo(bar(), 1); doesn't break your code <oriansj>look carefully at the C versions because they'll help you avoid it. <oriansj>the cc_* is much closer to what you'll be doing as wrongish but working is good enough to get a M2-Planet that can self-host <stikonas>since cc_* only has to be able to build one single program <oriansj>the big differences are about dealing with correct output behavior which need to be subtly checked. <oriansj>structs and the matching of strings are the only things that absolutely have to be right in cc_* <oriansj>being wasteful in the call stack is fine <stikonas>crashing somewhere is not fine though... Time to debug and find where I messed up... <oriansj>now if you are referring to common_recursion recieving a function pointer <oriansj>we are just passing the function to be directly called. So passing the &label is exactly what is needed for the f(); call inside of it. <oriansj>well the comments unfortunately are out of date and in some cases straight wrong. As they were originally to help me work out how to build the state machine to match the C precedence order <stikonas>I was just trying to see if I am misreading somerthing or not <oriansj>basically the precedence is embedded in the call order.