IRC channel logs

2022-06-13.log

back to list of logs

<oriansj>little endian word, opcode is the last 6bits; which when written to memory is in the first 8bits
<oriansj> https://danielmangum.com/static/risc_v_inst_intro_1.png
<muurkha>it's not the last 6 bits, it's the low-order 6 bits, which makes it 6 bits of the first byte, yes
<muurkha>the least significant byte
<oriansj>muurkha: forgive my lack of clarification. when I say the last 6bits, I am speaking in reference to the word from the perspective of big endian bit ordering. Which does map directly to what you have expressed in regards to low-order bits and on little-endian architectures would map to the high byte of the word in memory (or first byte in big endian perspective)
<muurkha>yes, I agree that it would be 6 bits of the last byte in a big-endian encoding, but RISC-V is defined to be little-endian. if you wanted to do the same thing with a big-endian instruction encoding you'd have to put the instruction length data in the high byte (or syllable or whatever)
<muurkha>otherwise you'd have to guess where to find it :)
<oriansj>indeed
<muurkha>one of the things that makes the PDP-8 (and, say, the original ARM and the Berkeley RISC) so simple is that they don't have variable-length instructions
<muurkha>the approach used in Chuck Moore's MISC cores is interesting: he packs multiple 5-bit instructions into an instruction word (20 bits on the MuP21 and F21, 18 bits on the F18A processors in the GA144)
<oriansj>well there definitely is decode logic savings there but RISC-II showed one could support multiple encoding sizes efficiently
<muurkha>one of the instructions is "literal", which prevents the rest of the word from being decoded as instructions, instead pushing it on the stack
<muurkha>I've never looked at the RISC-II, maybe I should
<oriansj>it supported both 16bit and 32bit instructions
<oriansj>but internally expanded all 16bit instructions to 32bits
<muurkha>like Thumb and RV32C/RV64C
<muurkha>the MuP21 was 7000 transistors, less than twice the size of the (Intersil 6100) PDP-8
<oriansj>well Thumb can expand to more than just 1 instruction
<muurkha>oh really? I didn't know that!
<muurkha>btw if you think the RISC-V intruction encoding is ridiculous, wait until you hear about the MuP21 encoding
<muurkha>instead of concatenating the four instructions in the word one after the other, their bits are interleaved
<muurkha>and the odd-numbered instructions are complemented
<muurkha>why? presumably because that saved real estate in the instruction register, you only have to shift the register by one bit in between instructions instead of 5 bits
<oriansj>muurkha: and I thought Itanic had bad instruction ideas
<muurkha>well, it worked out reasonably well, I think
<muurkha>interleaving the instructions and XORing with 0x55555 (or 0xAAAAA? I forget) is only two or three lines of code in the assembler, and it has the result of making all the instructions run faster and at lower power, and making the chip cheaper (I guess)
<muurkha>the main difficulty is that nobody ever targeted a C compiler to the chip, I think because Jeff Fox had this irrational hate for C
<muurkha>and he would have been the obvious person to do it
<oriansj>well C is a very easy language to hate
<muurkha>unlike the GA144 processors, the MuP21 had an external memory bus with a 1-mebiword address space (2.5 mebibytes), so a C compiler would have been a very reasonable way to program it
<muurkha>the MuP21 came out in 01994 at "80 MIPS", probably close to 40 Dhrystone MIPS: http://www.ultratechnology.com/p21.html
<oriansj>yeah I was reading: http://www.ultratechnology.com/mup21.html and i can't help but feel it was VLIW but with too clever by half instruction encoding
<muurkha>no, it's the opposite extreme from VLIW
<muurkha>VLIW is about unlocking lots of instruction-level parallelism by making the parallelism explicit in the instruction stream, like horizontal microcode
<muurkha>MISC, by contrast, is about minimizing the hardware complexity of the processor and the path length per instruction, even at the expense of needing multiple machine instructions in multiple cycles to do what conventional processors do in a single instruction, like a more extreme RISC
<muurkha>in the code I've looked at (which may not be ideal because I wrote most of it myself and am no expert on Forth) you need about two MISC instructions per RISC or CISC instruction
<muurkha>you can get a higher clock rate because you don't need to wait for operand MUXes to provide the right operands from the register file to the ALU; the TOS and NOS registers are hard-wired to the ALU inputs
<muurkha>but the tradeoff is that sometimes what's in them isn't what you actually want to operate on, so you often need another instruction or two to get the right values into those registers
<muurkha>this is also, in my experience, more cognitive effort when you're writing machine code, although Forth fans disagree
<oriansj>well if one thinks like a C stack machine and not as an assembly programmer, it is easier
<muurkha>yeah
<muurkha>it's easier to write a reasonable compiler for because generally you don't need register allocation for expression evaluation
<muurkha>although on the MuP21 I think the operand stack was only 10 words deep, so overflow is possible
<muurkha>hmm, no, 4 return-stack items and 6 data-stack items. Moore expanded these to 8 and 10 on the F18A and maybe even on the F21, not sure
<muurkha>so maybe extra compiler complexity instead of reduced compiler complexity
<muurkha>according to http://www.ultratechnology.com/mfp21.htm: "The P21 has on chip stacks a total of 13 registers: 6 data stack cell registers..."
<oriansj>wow and even AT&T did a better FORTH chip than that
<muurkha>perhaps if you think it's better your metric of what makes good Forth is opposed to Moore's :)
<oriansj>AT&T Hobbit
<muurkha>the one they were going to use in the Newton
<oriansj>and BeBox
<muurkha>yeah
<muurkha>I don't think the Hobbit was really meant as a Forth processor, but as a C processor
<oriansj>true but it was a 1:1 map to forth in terms of assembly
<muurkha>kind of
<muurkha>the thing about having an infinitely deep data stack (or a 64-deep one like the Hobbit) allows you to write really bad Forth
<muurkha>stuff that takes an inordinate amount of mental effort to get working
<oriansj>as a really bad FORTH programmer: fair
<oriansj>but perhaps I prefer architectures where bad programmers can work and learn to become better.
<oriansj>make mistakes more forgivable and a gentle transition path to better code
<oriansj>I guess that is why I like Knight; you can do it the wrong way if you want or even try something crazy
<muurkha>well, I think that's what the stack limit does: your bad code crashes and you realize that it's because you have too much stuff on the stack
<muurkha>so you put it in variables instead and your code becomes much more comprehensible
<muurkha>that way you can avoid spending a lot of time debugging things with super complicated stack manipulations that very cleverly avoid ever putting anything in memory
<muurkha>it doesn't give you enough rope to shoot yourself in the foot, as they say
<muurkha>it's pretty much the same thing that happens with 8 or 16 named registers, just manifesting slightly differently
<oriansj>well doing void foo(struct list* a) { if(NULL == a) return; foo(a->next); puts(a->text); } is simple but will segfault but reversing the list twice and just iterating, I guess would be the more correct way...
<muurkha>why will it segfault?
<oriansj>stack overflow
<oriansj>if your list is long enough
<muurkha>oh, you mean if you don't store your return addresses in memory?
<muurkha>I think you need to store your return addresses in memory
<oriansj>well that code stores the return address on the stack
<muurkha>same as on RISC-V or ARM
<muurkha>yeah, you can't use the MuP21 return stack as the C call stack, it's not deep enough
<muurkha>it's more like a link register, as in a JAL instruction
<oriansj>ugh
<oriansj>if it was an option and not the only option it would be fine
<muurkha>this is not a difference between ARM and MuP21
<muurkha>the ARM's "return stack" is only one level deep
<muurkha>so storing your return address in memory (for non-leaf subroutines) is the only option on ARM
<muurkha>(x86 too, it's just sneakier about it)
<oriansj>well link registers are the more *riscy* return option but perhaps I find them suboptimal
<muurkha>they do require a bit of extra prologue and epilogue code for non-leaf subroutines
<oriansj>indeed
<muurkha>they're surely higher performance than implicitly hitting the dcache, though
<oriansj>as M2-Planet spent a good bit of time documenting
<muurkha>the MuP21 approach lets you use the return stack for a loop counter or other local variable or two, as well
<oriansj>or explicitly hitting in knight's pushr R15 R0
<oriansj>call R15 @foo
<muurkha>by making the user-visible architectural storages stacks instead of registers you need fewer of them: data, return, PC, and A on the P21
<muurkha>in a totally different direction, I was thinking about how the PDP-8's goofy addressing scheme is sort of like Smalltalk
<oriansj>but without the missing source code problem
<muurkha>missing source code problem?
<oriansj>smalltalks tend to "lose" source code, especially in regards to complex features
<oriansj>much like we saw in Bison in regards to features that are ugly to implement but can be used to produce a cleaner version of themselves
<muurkha>oh, I understand what you mean
<muurkha>yeah, they build from an image, not from source code
<muurkha>I meant that each page of memory is sort of like a Smalltalk object though
<muurkha>you have one form of instruction to access "instance variables" (in the current page) and another form of instruction to access other things (by using the indirect bit)
<muurkha>on the PDP-8
<oriansj>I can see that
<muurkha>I was thinking it would be better if you had a class pointer as well as an instance pointer so that you can share the same code between different instances
<muurkha>and a stack pointer
<oriansj>I say, no special purpose pointers (besides PC maybe)
<muurkha>well, that's the MISC/RISC direction
<muurkha>I'm talking about the PDP/8 direction, which has lots of irregularities
<muurkha>in order to save transistors
<muurkha>if you restrict instances to have sizes that are a small range of powers of 2, you can index off the instance base pointers with bitwise OR instead of with addition, saving you a word-width adder
<muurkha>if you want to avoid having a link register or self-modifying code you probably also need a stack pointer for subroutine calls
<oriansj>I was thinking more RCA 1802
<muurkha>I've never looked at its instruction set
<oriansj>it has no dedicated PC
<oriansj>and any register can be set to be the PC at anytime
<muurkha>I see, so you could have a hardware stack in the same way, by having a "set stack pointer" instruction that determines which of various registers is the current stack pointer
<oriansj>or as was done in knight, encode which register to use as the stack pointer in the instruction itself
<muurkha>sure, same as the RISC-V or 68000
<oriansj>and i can imagine using multiple stack registers but I can't imagine a *good idea* where one would want to change the PC register often
<muurkha>I mean, RISC-V doesn't store the return address on the stack, but you can use any register as the stack pointer
<oriansj>and without any push or pop instructions
<muurkha>yeah
<muurkha>typically on RISC-V you subtract a constant from the stack pointer and then do a buch of indexed stores to set up your stack frame
<muurkha>also, for that matter, that's what GCC does for i386 and amd64 generally
<muurkha>you could reserve different PCs for different coroutines
<muurkha>or different stack levels: register 0 for the PC for leaf subroutines, register 1 for subroutines that call only leaf subroutines, and register 2 for all other subroutines
<muurkha>that way the return from level-0 and level-1 subroutines is just a SEP instruction, and calling them is two instructions
<muurkha>I have this intuition that dynamically most calls and returns are to leaf subroutines, and even more are to leaf subroutines or subroutines that call only leaf subroutines
<muurkha>btw, oriansj, do you know about the history of the PDP-X? not jcowan's PDP-8/X
<oriansj>muurkha: you mean the one that became the DG Nova?
<muurkha>yes, and also the Alto and the PDP-11
<muurkha>and thus gave us WIMP and Unix
<oriansj>I am pretty sure the PDP-11 and PDP-X were competitors and the PDP-X lost out
<muurkha>well, the PDP-X itself was canceled, which is why the team left DEC to start Data General
<muurkha>the PDP-11 project itself started a couple of years later as Desk Calculator
<oriansj>and looking at its description was rather ambitious as a register-memory instruction architecture
<muurkha>what's a register-memory instruction architecture?
<oriansj>think vax instructions with 1 operand always in memory
<oriansj>2 operand only btw
<oriansj>so add r0 [address] and add [address] r5 but never add r0 r5
<muurkha>hmm, what's ambitious about that?
<muurkha>I mean the PDP-8 could be described that way too, it's just that it had a single accumulator instead of several
<oriansj>fitting it in the manufacturing cost budget
<oriansj>VAX sort of complexity in a pdp-8 budget is no simple feat
<muurkha>I don't think having multiple general-purpose registers amounts to VAX sort of complexity
<oriansj> http://simh.trailing-edge.com/docs/pdpx.pdf
<muurkha>I mean the PDP-11 had 8 GPRs and the Nova had 4
<muurkha>interesting!
<muurkha>it sounds like it was *less* ambitious than its children though. as you'd expect!
<oriansj>the multiple register sets was certainly new
<muurkha>well, it was a difference from the PDP-8, but not, for example, the IBM 360 model 20
<muurkha>I don't think it had multiple register *files*, though, just multiple accumulators (what we'd call "registers" today)
<muurkha>no, I'm wrong
<muurkha>it did
<oriansj>the NOVA was massively simplified relative to the PDP-X and it still costed more than a PDP-8
<muurkha>it did have some simplifications
<muurkha>there were previous processors that had multiple register files, too, like the TX-0 on which Sketchpad was written and the PPU of the CDC 6600. the later Z-80 did too
<oriansj>alot of simplifications but yeah
<muurkha>the TMS 9900 that was in the TI-99/4A home computer a bit later actually had an unlimited number of register files because they were in RAM
<oriansj>super unreliable chips though
<muurkha>what, the TMS 9900?
<muurkha>I hadn't ever heard that
<muurkha>it sounds like the Nova really was sort of VLIWish. I mean the PDP-8 is a little bit that way but the Nova seems to have been much more so
<muurkha>the addressing modes on the Nova sound almost exactly the same as the addressing modes on the PDP-X
<oriansj>muurkha: yeah, I seen lots of issues with reliablity on those chips by people trying to make custom Computers with them
<muurkha>interesting, where did they get them?
<oriansj>couldn't say but if I had to guess, I would say third party distributors
<muurkha>I agree with Supnik's assertion that both the Nova and the PDP-11 were major advances over the PDP-X
<muurkha>but I also think it's reasonable to see the PDP-11 and especially the Nova as being more fully developed versions of the PDP-X; there's a pretty clear family resemblance from my point of view
<oriansj>the reliablity had often to do the very touchy RAM requirements which if not exactly right lead to impossible to trace bugs
<muurkha>though Supnik disagrees, and I guess he knows a thing or two I don't :)
<oriansj>did you see the Jason Scott Supnik interview yet?
<muurkha>no, that sounds cool
<muurkha>the Smalltalk-80 virtual machine also had four addressing modes, like the Nova (on the hardware of a clone of which it was implemented in microcode) and the PDP-X
<oriansj> https://archive.org/details/GETLAMP-Supnik
<oriansj>most if it never made it into Get LAMP
<oriansj>but it is an interview that really should have been much longer
<muurkha>the four Smalltalk-80 addressing modes were: local variables; instance variables; global variables; and constant-pool entries
<muurkha>global variables and constant-pool entries roughly correspond to PDP-8 zero-page and current-page addressing
<muurkha>except you also had to use current-page addressing for local variables and instance variables
<muurkha>so I was thinking that two more registers for those would be handy. apparently the PDP-X and Nova designers had the same thought
<muurkha>very plausibly I was unconsciously recycling the Nova idea as laundered through Smalltalk
<muurkha>I think Supnik is talking about 01977 or 01978 in the early part of this interview, "5 years before the IBM PC"
<muurkha>when PDP-11 access was widespread within DEC
<muurkha>ah, yep. "I only joined DEC in the summer of 1977".
<oriansj>thinking deeper on the 1802 SEP instruction and if it was extended to multiprocessing; it probably should be a Kernel level or above only instruction as otherwise polymorphic code becomes the least of your concerns
<muurkha>hmm? no, I think calling subroutines and returning from them is a reasonable thing for user code to do
<muurkha>as is yielding to coroutines
<muurkha>but what does "if it was extended to multiprocessing" mean? adding a compare-and-swap instruction to the instruction set?
<oriansj>muurkha: imagine you are a kernel and you don't know which register is the instruction pointer
<oriansj>store the current process state correctly.
<muurkha>well, you do need to have a register you can save and restore that tells you which accumulator is the PC at the moment, yes
<muurkha>oh, you meant multitasking, not multiple CPUs sharing memory
<oriansj>sorry if I used the incorrect term but I am glad you realized the implied meaning
<muurkha>sometimes it takes me a while but I did eventually figure it out
<muurkha>hey, I had no idea Supnik managed Alpha
<oriansj>muurkha: he was also the one that killed Prism
<oriansj>^Prism^DEC Prism^
<muurkha>Prism?
<muurkha>I'm amused to hear that he thought of encoding strings in RADIX-50 as being "an ugly trick"
<muurkha>the Infocom Z-machine did something similar but slightly less extreme IIRC
<oriansj> http://simh.trailing-edge.com/semi/uprism.html
<oriansj>the precursor to DEC Alpha
<muurkha>yeah, a 5-bit encoding: https://www.inform-fiction.org/zmachine/standards/z1point0/sect03.html
<muurkha>so 32 characters rather than the 40 of RADIX50
<muurkha>but similarly three characters per 16-bit word
<oriansj>very clever given the memory limits
<muurkha>Prism sounds very painful
<oriansj>64 truly general purpose registers (floating point/integer) and EPICODE which was much more powerful to the later PALCODE of DEC Alpha
<muurkha>sounds like context switching would have been very slow
<muurkha>it'd be interesting to find out how its performance compared with MIPS
<muurkha>probably Supnik made the right call
<muurkha>RADIX50 didn't have shift states, so ZSCII actually had a larger character repertoire
<muurkha>including, importantly for Zork, lower case
<oriansj>right business call, probably. Right engineering call, harder to say
<muurkha>yeah, DEC wasn't yet on the ropes the way it would be ten years and a farrago of patent lawsuits later
<muurkha>so maybe developing their own high-performance RISC chip in 01988 instead of 01993 would have been worth the cost
<muurkha>last night I was reading Kevin Carson's "The Homebrew Industrial Revolution" (CC-BY-SA) in which he documents, among other things, the nightmarish patent landscape around electricity in the late 01800s that led to the GE/Westinghouse duopoly
<oriansj>also, DEC did refuse to sell their Alpha CPUs to Apple which only ended up creating PowerPC as a competitor
<muurkha>did they? I didn't know that
<muurkha>retaining Dave Cutler instead of losing him to Microsloth probably would have helped DEC significantly
<oriansj>yeah Ken Olsen the CEO of DEC was the one who refused to make business deals with a man who would cheat so brazenly on his wife: https://www.goodreads.com/book/show/341220.DEC_Is_Dead_Long_Live_DEC
<muurkha>hm?
<muurkha>you may be seeing something on that page I don't
<oriansj>oh, no it is just a link to the book where I find that tidbit
<muurkha>ah
<muurkha>I think that in general in business you have to be prepared for the risk that you're buying and selling to people who are not entirely honest and who may rip you off
<oriansj>be willing to sell the rope needed to hang you sort of model...
<oriansj>John Sculley, Apple's CEO of those days, met with Kenneth Olsen in June of the same year and offered him to use the new processor of DEC in future Macs. Olsen refused the offer
<oriansj> http://alasir.com/articles/alpha_history/dec_collapse.shtml
<oriansj>with a slightly different reason given than the book
<oriansj>but the meeting did occur and the sale was ultimately refused.
<oriansj>but one could argue the culture of DEC was ultimately what killed it as a company. It just didn't have the money gene needed to become only a componet company instead of a computer company
<muurkha>CDC ended up making disk drives
<muurkha>Unisys is best known nowadays as a patent troll
<muurkha>though, speaking of stack machines, they do still sell ClearPath MCP mainframes
<oriansj>IBM is now most successful reselling Red Hat support
<muurkha>I think IBM Global Services may be their biggest division
<oriansj>well those Red Hat Government contracts certainly are a great cash machine for them
<oriansj>assuming they don't get too greedy and kill that golden goose...
<muurkha>it may happen due to events outside their control
<muurkha>no goose is immortal
<oriansj>but a smart person knows that proper geese herding involves breeding; thus providing an eternal line
<muurkha> http://alasir.com/articles/alpha_history/prism_to_alpha.html claims PRISM had Cray-style (RV64V-style) vector registers, although it says they were "64-bit vector registers", which is way too small
<muurkha>it says there was a 7-bit vector length register, so presumably the vector registers were actually 128-entry
<GeDaMo> https://bootstrapping.miraheze.org/wiki/Main_Page "This wiki has no edits or logs made within the last 45 days, therefore it is marked as inactive. If you would like to prevent this wiki from being closed, please start showing signs of activity here."
<muurkha>thanks for the reminder!
<muurkha>I don't think I have an account
***genr8eofl__ is now known as genr8eofl
***Noisytoot_ is now known as Noisytoot