IRC channel logs

2020-06-22.log

back to list of logs

<OriansJ`>xentrac: well Wirth's designs are generally quite balanced in terms of performance to minimalism
<xentrac>yeah, although he typically goes perhaps a bit far in the minimalism direction
<xentrac>I don't remember how many logic blocks Wirth's RISC takes up on a Xilinx FPGA
<OriansJ`>xentrac: I find his hardware design makes much more sense if you imagine it never running hand written assembly
<xentrac>which aspects of the hardware design?
<OriansJ`>assuming we are discussing: https://people.inf.ethz.ch/wirth/FPGA-relatedWork/RISC-Arch.pdf
<OriansJ`>no add-carry, no subtract-borrow in hardware
<OriansJ`>No Multiply high or modulus or remainder
<OriansJ`>no add immediate or subtract immediate forms
<OriansJ`>no call register nor jump register instructions
<OriansJ`>oops read the paragrah wrong
<OriansJ`>load/stores are either 32bit or 8bit and nothing else
<darius>xentrac, really 4 memory instructions
<darius>load/store word/byte
<OriansJ`>ironically with ~label being 24bit offset and @immediate being 16bit immediates, it maps nearly perfectly to M1 and hex2
<OriansJ`>I could probably port hex0->M2-Planet to it in less than a month
<OriansJ`>So if we wanted, it certainly could very easily become another root port for stage0
<OriansJ`>just need a guix package for its simulator and a solid excuse to work on it
<OriansJ`>(such as someone doing it in TTL who would later share the schematic under an FSF approved license)
<darius>OriansJ`, wirth's pdf lists for the ISA add/sub all of those except for call through register and the multiply/divide instructions you mentioned. i haven't looked at his implementation.
<OriansJ`>darius: you might be correct as I did a quick rush through read/issue search
<xentrac>yeah, true, 4
<OriansJ`>although I might think the 110v branch instructions might actually use a register value
<xentrac>it does actually have add-carry and subtract-borrow in hardware; that's add' and sub'
<xentrac>multiply high and modulo do seem like serious omissions (if you're going to have multiply and divide anyway)
<darius>OriansJ`, oh right
<OriansJ`>xentrac: only if one is writing in assembly
<OriansJ`>they are not serious omissions if you are writing everything in pascal or C
<OriansJ`>also it does not specify if bit or little endian for either bits nor bytes
<OriansJ`>^bit^big^ endian
<OriansJ`>of course it is a big change from the NS32000, that Wirth previously used.
<OriansJ`>which easily was a $200K design if it was in TTL, rather than the $500 CMOS chip it initially retailed as
<xentrac>oh, I see what you mean
<xentrac>although Pascal and C have mod and % operators
<xentrac>also, multiply high is pretty important for long long support in C
<xentrac>but that's a GCC extension
<OriansJ`>you can get the same value by bitshifting and other fun; which "might" end up being faster depending upon your basis of hardware implementation (FPGA vs custom asic)
<OriansJ`>if one would to do custom asic, hardware multiply high would certainly be the superior choice. Although Wirth consumed all 256 Potential encodings for the first opcode byte
<xentrac>probably nowadays yeah
<OriansJ`>which means one would have to do something funky in 00v0 to squeeze it in the 12bits of wasted space
<darius>fwiw wirth's longer document says "The DIV instruction deposits the remainder in an auxiliary register H." but seems to mention it nowhere else.
<darius>might be a similar story wrt multiply high
<OriansJ`>darius: perhaps a MIPS idea steal that he later realized was a very bad idea
<xentrac>you could do the same thing with multiply, as the 8086 does
<xentrac>jinx
<darius>it goes back long before mips
<xentrac>weren't we seeing that the PDP-8 EEA did that in the late 1960s?
<xentrac>with the MQ register
<xentrac>yesterday
<OriansJ`>darius: MIPS had a famous performance bottle neck cased by a specialized register for division/multiplication
<darius>i didn't review pdp8 but i've seen mq in very old designs
<darius>mm hm
<OriansJ`>but then again the Only architecture that did Multiplication right out of the gate was DEC alpha
<OriansJ`>but they completely did byte instructions wrong
<OriansJ`>so bad they had to add "The byte extension"
<OriansJ`>as Alpha originally only loaded 64bit values and you had to do masking and bit shifting to get anything smaller
<OriansJ`>if I remember correctly
<xentrac>sounds legit
<OriansJ`>*correction only 32 and 64bit values*
<xentrac>how did their multiply work?
<OriansJ`>They had 3op multiply and 3op multiply high
<xentrac>three-operand?
<OriansJ`>Rc := Ra op Rb; eg R0 = R0 x R1
<xentrac>three-operation?
<OriansJ`>or R11 = R2 - R4
<OriansJ`>yes
<OriansJ`>err operands
<OriansJ`>So 3 different registers were explicitly specified
<OriansJ`>2 source and 1 destination
<xentrac>makes sense
<OriansJ`>so mul would just give you the bottom half and mulh would give you the top half
<xentrac>sounds convenient but inefficient
<OriansJ`>with a single clock delay because multiplication was pipelined
<xentrac>I bet you hate the x18's multiply :)
<OriansJ`>So 3 clocks for MUL but 4 clocks for bot MUL and MULH
<OriansJ`>I do dislike how x86 does multiplication and division; which AMD64 never fixed
<OriansJ`>if AMD just reallocated a single 1byte opcdoe from x86 for 3op instructions; they could have had a 4byte encoding with support for 2^12 instructions
<xentrac>until FMA there weren't any 3-operand instructions at all, were there?
<OriansJ`>xentrac: hope
<OriansJ`>^nope^
<OriansJ`>and if they allocated the first 4bits as specifiers, then the next 8bits of the XOP could be used for 256 instructions.
<OriansJ`>Say 0000 for integer, 0001 for floating point and 0010 for SIMD integer and 0011 for SIMD floating point
<OriansJ`>which would have actually ended up making AMD64 more dense then x86
<xentrac>its bloated immediates count heavily against it
<xentrac>if you're going for density
<OriansJ`>but they probably tried to minimize the differences to enable minimal engineering efforts on the software side
<OriansJ`>also they could have enables 8, 16 and 32bit immediates for all instructions that supported immediates
<OriansJ`>Toss in flexiable register push/pop and now you could designate R15 for your argument stack pointer and RSP simply becomes argument stack only; throw a poison page inbetween and stack overflows become impossible
<OriansJ`>^Return stack for RSP^
<xentrac>yeah, they weren't very strongly influenced by Forth chips, though, I think
<OriansJ`>I think it was AMD's lack of programmers which ultimately handicapped AMD64
<xentrac>really? I don't think AMD's lack of programmers is the reason my cellphone is aarch64
<OriansJ`>xentrac: well that has more to do with who has AMD64 licenses
<xentrac>that's certainly a factor, yeah
<xentrac>but I think power consumption probably mattered more
<xentrac>(not entirely disconnected...)
<OriansJ`>as PowerPC has shown with P.A. Semi's PA6T
<OriansJ`>once we got past 130nm Architecture stopped being the deciding factor for Performance/Watt
<xentrac>not sure about that; there's a big, power-hungry micro-op decoding area on amd64 (and modern i386) chips that's just absent on ARMs
<xentrac>aarch64 even dropped Thumb!
<OriansJ`>and a Pentium Pro done in 40nm would consume less power than any commercially available Aarch64 chip but its performance would also be worse
<xentrac>Intel tried for a while to promote Android on amd64, but I think they finally gve up on that
<OriansJ`>xentrac: margins were too low
<xentrac>I mean you could argue that Intel's designers aren't the sharpest hammers in the bag; certainly that was the opinion of their Itanic co-designers at HP
<OriansJ`>Intel is too used to big fat margins on chips sold; phones are heading towards razor thin margins
<xentrac>Intel has been competing in some low-margin markets since the 80s, though not CPUs; you'd think they'd have some of that in their DNA still
<xentrac>but mabye not
<OriansJ`>xentrac: if it was, they would have followed AMD and ditched their FAB to the highest bidder back when it would have made them bank
<OriansJ`>But the margin on paper of running off their own fab looked too good for them to give up
<xentrac>you're suggesting they should have gone fabless? I think it's far from settled that vertical integration is an across-the-board lose
<xentrac>certainly there are possible future scenarios where AMD can no longer compete with Intel precisely because Intel is vertically integrated
<OriansJ`>yes as it would have forced an industry wide arms race on process and design; allowing much more competition to explode and benefit us all
<xentrac>do you mean, like, decades ago?
<OriansJ`>yes
<OriansJ`>although that would probably result in another design becoming super dominate and the cycle end up repeating
<xentrac>right now Intel is still in the running to make it to 5nm
<xentrac>though pretty far behind ;)
<OriansJ`>xentrac: 5nm is now just marketing for density not transistor size
<xentrac>it wouldn't be surprising if Intel made it to 3nm and TSMC didn't
<OriansJ`>it has been that way since 22nm
<xentrac>what do you mean?
<OriansJ`>as Intel's 10nm transistors are the same size as their 22nm transistors
<OriansJ`>but they are beter at packing them in
<xentrac>higher resolution will do that
<OriansJ`>or simply more gate layers
<OriansJ`>or rotate the transistors to shingle them
<OriansJ`> https://en.wikipedia.org/wiki/10_nm_process for reference
***terpri__ is now known as terpri
*xentrac rotates his transistors
***terpri__ is now known as terpri
***nikita_ is now known as nikita`
***ChanServ sets mode: +o rekado_
***rekado_ is now known as rekado
<OriansJ`>xentrac: well that is a rather hard engineering problem
***terpri__ is now known as terpri