IRC channel logs

<OriansJ`>xentrac: well Wirth's designs are generally quite balanced in terms of performance to minimalism

<xentrac>yeah, although he typically goes perhaps a bit far in the minimalism direction

<xentrac>I don't remember how many logic blocks Wirth's RISC takes up on a Xilinx FPGA

<OriansJ`>xentrac: I find his hardware design makes much more sense if you imagine it never running hand written assembly

<xentrac>which aspects of the hardware design?

<OriansJ`>assuming we are discussing: https://people.inf.ethz.ch/wirth/FPGA-relatedWork/RISC-Arch.pdf

<OriansJ`>no add-carry, no subtract-borrow in hardware

<OriansJ`>No Multiply high or modulus or remainder

<OriansJ`>no add immediate or subtract immediate forms

<OriansJ`>no call register nor jump register instructions

<OriansJ`>oops read the paragrah wrong

<OriansJ`>load/stores are either 32bit or 8bit and nothing else

<darius>xentrac, really 4 memory instructions

<darius>load/store word/byte

<OriansJ`>ironically with ~label being 24bit offset and @immediate being 16bit immediates, it maps nearly perfectly to M1 and hex2

<OriansJ`>I could probably port hex0->M2-Planet to it in less than a month

<OriansJ`>So if we wanted, it certainly could very easily become another root port for stage0

<OriansJ`>just need a guix package for its simulator and a solid excuse to work on it

<OriansJ`>(such as someone doing it in TTL who would later share the schematic under an FSF approved license)

<darius>OriansJ`, wirth's pdf lists for the ISA add/sub all of those except for call through register and the multiply/divide instructions you mentioned. i haven't looked at his implementation.

<OriansJ`>darius: you might be correct as I did a quick rush through read/issue search

<xentrac>yeah, true, 4

<OriansJ`>although I might think the 110v branch instructions might actually use a register value

<xentrac>it does actually have add-carry and subtract-borrow in hardware; that's add' and sub'

<xentrac>multiply high and modulo do seem like serious omissions (if you're going to have multiply and divide anyway)

<darius>OriansJ`, oh right

<OriansJ`>xentrac: only if one is writing in assembly

<OriansJ`>they are not serious omissions if you are writing everything in pascal or C

<OriansJ`>also it does not specify if bit or little endian for either bits nor bytes

<OriansJ`>^bit^big^ endian

<OriansJ`>of course it is a big change from the NS32000, that Wirth previously used.

<OriansJ`>which easily was a $200K design if it was in TTL, rather than the $500 CMOS chip it initially retailed as

<xentrac>oh, I see what you mean

<xentrac>although Pascal and C have mod and % operators

<xentrac>also, multiply high is pretty important for long long support in C

<xentrac>but that's a GCC extension

<OriansJ`>you can get the same value by bitshifting and other fun; which "might" end up being faster depending upon your basis of hardware implementation (FPGA vs custom asic)

<OriansJ`>if one would to do custom asic, hardware multiply high would certainly be the superior choice. Although Wirth consumed all 256 Potential encodings for the first opcode byte

<xentrac>probably nowadays yeah

<OriansJ`>which means one would have to do something funky in 00v0 to squeeze it in the 12bits of wasted space

<darius>fwiw wirth's longer document says "The DIV instruction deposits the remainder in an auxiliary register H." but seems to mention it nowhere else.

<darius>might be a similar story wrt multiply high

<OriansJ`>darius: perhaps a MIPS idea steal that he later realized was a very bad idea

<xentrac>you could do the same thing with multiply, as the 8086 does

<xentrac>jinx

<darius>it goes back long before mips

<xentrac>weren't we seeing that the PDP-8 EEA did that in the late 1960s?

<xentrac>with the MQ register

<xentrac>yesterday

<OriansJ`>darius: MIPS had a famous performance bottle neck cased by a specialized register for division/multiplication

<darius>i didn't review pdp8 but i've seen mq in very old designs

<darius>mm hm

<OriansJ`>but then again the Only architecture that did Multiplication right out of the gate was DEC alpha

<OriansJ`>but they completely did byte instructions wrong

<OriansJ`>so bad they had to add "The byte extension"

<OriansJ`>as Alpha originally only loaded 64bit values and you had to do masking and bit shifting to get anything smaller

<OriansJ`>if I remember correctly

<xentrac>sounds legit

<OriansJ`>*correction only 32 and 64bit values*

<xentrac>how did their multiply work?

<OriansJ`>They had 3op multiply and 3op multiply high

<xentrac>three-operand?

<OriansJ`>Rc := Ra op Rb; eg R0 = R0 x R1

<xentrac>three-operation?

<OriansJ`>or R11 = R2 - R4

<OriansJ`>yes

<OriansJ`>err operands

<OriansJ`>So 3 different registers were explicitly specified

<OriansJ`>2 source and 1 destination

<xentrac>makes sense

<OriansJ`>so mul would just give you the bottom half and mulh would give you the top half

<xentrac>sounds convenient but inefficient

<OriansJ`>with a single clock delay because multiplication was pipelined

<xentrac>I bet you hate the x18's multiply :)

<OriansJ`>So 3 clocks for MUL but 4 clocks for bot MUL and MULH

<OriansJ`>I do dislike how x86 does multiplication and division; which AMD64 never fixed

<OriansJ`>if AMD just reallocated a single 1byte opcdoe from x86 for 3op instructions; they could have had a 4byte encoding with support for 2^12 instructions

<xentrac>until FMA there weren't any 3-operand instructions at all, were there?

<OriansJ`>xentrac: hope

<OriansJ`>^nope^

<OriansJ`>and if they allocated the first 4bits as specifiers, then the next 8bits of the XOP could be used for 256 instructions.

<OriansJ`>Say 0000 for integer, 0001 for floating point and 0010 for SIMD integer and 0011 for SIMD floating point

<OriansJ`>which would have actually ended up making AMD64 more dense then x86

<xentrac>its bloated immediates count heavily against it

<xentrac>if you're going for density

<OriansJ`>but they probably tried to minimize the differences to enable minimal engineering efforts on the software side

<OriansJ`>also they could have enables 8, 16 and 32bit immediates for all instructions that supported immediates

<OriansJ`>Toss in flexiable register push/pop and now you could designate R15 for your argument stack pointer and RSP simply becomes argument stack only; throw a poison page inbetween and stack overflows become impossible

<OriansJ`>^Return stack for RSP^

<xentrac>yeah, they weren't very strongly influenced by Forth chips, though, I think

<OriansJ`>I think it was AMD's lack of programmers which ultimately handicapped AMD64

<xentrac>really? I don't think AMD's lack of programmers is the reason my cellphone is aarch64

<OriansJ`>xentrac: well that has more to do with who has AMD64 licenses

<xentrac>that's certainly a factor, yeah

<xentrac>but I think power consumption probably mattered more

<xentrac>(not entirely disconnected...)

<OriansJ`>as PowerPC has shown with P.A. Semi's PA6T

<OriansJ`>once we got past 130nm Architecture stopped being the deciding factor for Performance/Watt

<xentrac>not sure about that; there's a big, power-hungry micro-op decoding area on amd64 (and modern i386) chips that's just absent on ARMs

<xentrac>aarch64 even dropped Thumb!

<OriansJ`>and a Pentium Pro done in 40nm would consume less power than any commercially available Aarch64 chip but its performance would also be worse

<xentrac>Intel tried for a while to promote Android on amd64, but I think they finally gve up on that

<OriansJ`>xentrac: margins were too low

<xentrac>I mean you could argue that Intel's designers aren't the sharpest hammers in the bag; certainly that was the opinion of their Itanic co-designers at HP

<OriansJ`>Intel is too used to big fat margins on chips sold; phones are heading towards razor thin margins

<xentrac>Intel has been competing in some low-margin markets since the 80s, though not CPUs; you'd think they'd have some of that in their DNA still

<xentrac>but mabye not

<OriansJ`>xentrac: if it was, they would have followed AMD and ditched their FAB to the highest bidder back when it would have made them bank

<OriansJ`>But the margin on paper of running off their own fab looked too good for them to give up

<xentrac>you're suggesting they should have gone fabless? I think it's far from settled that vertical integration is an across-the-board lose

<xentrac>certainly there are possible future scenarios where AMD can no longer compete with Intel precisely because Intel is vertically integrated

<OriansJ`>yes as it would have forced an industry wide arms race on process and design; allowing much more competition to explode and benefit us all

<xentrac>do you mean, like, decades ago?

<OriansJ`>yes

<OriansJ`>although that would probably result in another design becoming super dominate and the cycle end up repeating

<xentrac>right now Intel is still in the running to make it to 5nm

<xentrac>though pretty far behind ;)

<OriansJ`>xentrac: 5nm is now just marketing for density not transistor size

<xentrac>it wouldn't be surprising if Intel made it to 3nm and TSMC didn't

<OriansJ`>it has been that way since 22nm

<xentrac>what do you mean?

<OriansJ`>as Intel's 10nm transistors are the same size as their 22nm transistors

<OriansJ`>but they are beter at packing them in

<xentrac>higher resolution will do that

<OriansJ`>or simply more gate layers

<OriansJ`>or rotate the transistors to shingle them

<OriansJ`> https://en.wikipedia.org/wiki/10_nm_process for reference

***terpri__ is now known as terpri

*xentrac rotates his transistors

***terpri__ is now known as terpri

***nikita_ is now known as nikita`

***ChanServ sets mode: +o rekado_

***rekado_ is now known as rekado

<OriansJ`>xentrac: well that is a rather hard engineering problem

***terpri__ is now known as terpri

IRC channel logs

2020-06-22.log