IRC channel logs

2022-10-13.log

back to list of logs

<oriansj>doing NAND2Tetris makes me want to design my own instruction set architecture and CPU
<oriansj>then port stage0 to it
<oriansj>anyone know of any good tools for that sort of work?
<muurkha>without really having done it, I'd guess: yosys, nextpnr, APIO, and Lattice FPGAs
<oriansj>I guess I could figure out the high level details and encoding without deciding on the actual which bits encode which.
<aggi>gigatron ttl got some documentation too
<muurkha>maybe, but I think it's easy to end up with a field with 5 or 9 possible values that way
<oriansj>muurkha: well yes, expecially if it is 3 or more bits long
<muurkha>I mean, I think it's helpful to develop which bits encode which things in parallel
<oriansj>muurkha: ah very fair
<oriansj>well less dense instructions should certainly make for simpler decode logic
<oriansj>although I don't see a real benefit to allowing instructions like R0, R1 = R2 + R3
<oriansj>4 registers would be suboptimal for M2-Planet, 16 registers would map nicely to hex but 64 registers would be optimial for advanced optimizing compilers.
<oriansj>then I can do [opcode 8][register 6][register 6][immediate 18] and [opcode][register 6][register 6][xop 12][register 6]
<oriansj>which would leave lots of bits for future expansion and ensure the most common displacements would fit in a 256KB cache
<oriansj>wow I must be tired to have missed that. [opcode 10][register 6][register 6][immediate 18] and [opcode 10][register 6][register 6][xop 12][register 6] for 40 bit instructions (not dense at all) but 2^10 should give plenty of 2OPI instructions and the xop should more than cover a boatload of 3OP instructions without needing more than 1 or 2 of the 1024 values in the primary opcode space
<muurkha>lots of registers do impose costs on interrupt handling, context switches, and sometimes even subroutine calls
<muurkha>depending on your calling convention
<oriansj>muurkha: well if one can use the registers as either integer or floating point; it ends up reducing the number of wasted registers in any particular code block
<muurkha>maybe, maybe not
<muurkha>remember that doing that means that you have less integer registers (or less floating point registers) for the same size operand bitfield
<muurkha>because some of the registers you could conceivably be addressing for integer instructions are being used for floating point (sometimes; doesn't matter in a compiler or a kernel generally)
<muurkha>check out the italicized sections in the "F" chapter of the RISC-V unprivileged instruciton spec
<oriansj>true and in superscalar implementations, the register file will probably be duplicated anyway internally to reduce the number of read/write ports.
<muurkha>aliasing between integer and floating-point uses also causes substantial trouble for high-performance implementations
<muurkha>...or so I've heard, as you know by now, I've never designed a high-performance chip
<oriansj>well it incurs a single clock cycle delay between execute writing to a register and when the next execute cluster can read that value.
***Server sets mode: +cnt
<muurkha>oriansj: not sure that's the issue
<stikonas>argh, cc_amd64 does not support dereferencing pointers...
<stikonas>might need to drop to inline assembly
<stikonas>oh, maybe I don't need to do that...
***robin__ is now known as robin
***robin_ is now known as robin