IRC channel logs

<stikonas>fossy: so the remaining problem is probably caused by L prefix before strings

<stikonas>when operating on chars

<stikonas>on char strings

<stikonas>it only keeps first character

<stikonas>fossy: I can try to catch all unnecessary L prefixes in yacc and patch them out...

<fossy>L prefix is borked in yacc?

<stikonas>well, maybe in mes libc?

<stikonas>not sure where...

<stikonas>even without yacc, I can reproduce that issue

<stikonas>fossy: https://paste.debian.net/1182631/ prints only t

<stikonas>if you don't see any nicer solution, I can try to go over them one by one...

<stikonas>hmm, I think fprintf just refuses to work on wchars...

<stikonas>fossy: I think we'll have to rename wchar_t -> char

<stikonas>or maybe easier to write a function to convert wide string to string

<xentrac>stikonas: according to https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html it is totally valid and standards-compliant to define wchar_t as char

<xentrac>in other bootstrapping news I ran across https://insights.sei.cmu.edu/sei_blog/2019/10/how-to-build-a-trustworthy-freelibre-linux-capable-64-bit-risc-v-computer.html about bringing up Rocket Chip on an FPGA board in order to defeat Karger-Thompson attacks

<xentrac>and also OpenTitan: https://www.lowrisc.org/

<xentrac>the “first transparent silicon root of trust”

<xentrac>so things are really starting to come together

<xentrac>those are RISC-V, which is confusingly totally unrelated to Wirth's RISC5

<stikonas>xentrac: well, it's totally valid, but I think mes libc chokes on something...

<stikonas>and a bit hard to tell what exactly

<pabs3>this reminds me of Precursor https://www.bunniestudios.com/blog/?cat=71

<stikonas>somehow tokens that yacc reads and up only 1 byte in size...

<stikonas>rest is truncated

<xentrac>pabs3: yes, https://www.bunniestudios.com/blog/?p=5706 doesn't talk about what CPU core he wants to use; does he mention it anywhere?

<xentrac>ah yes

<xentrac>"The “Core Complex” currently consists of one RISC-V core, implemented using Charles Papon’s VexRiscV"

<xentrac>in https://www.bunniestudios.com/blog/?p=5971

<pabs3>yep, there is a block diagram on that page

<xentrac>RISC-V is astoundingly less complex than modern CPUs. the full architecture manual is about 300 pages, about the same as a Z80

<xentrac>I mean, than more popular modern CPUs

<xentrac>and a significant chunk of that is floating-point stuff that they've configured out of their "RV32IMAC" (RISC-V 32-bit with integer, multiply, atomic, and compressed instructions)

<xentrac>for better or worse, lowRISC's OpenTitan is derived from Google's Titan chip

<OriansJ>xentrac: but bootstrapping RISC-V is going to be a bitch

<OriansJ>The instruction encoding looks like a massive grad student orgy of questionable optimizations.

<OriansJ>because you know immediates should be compressed and broken into 5 difference pieces inside of a 32bit word.

<OriansJ>I think, we will have to duplicate the AArch64 hack and just do load/skips with immediates. It'll triple the instruction size to 12bytes to load an 8bit number but hey atleast RISC-V has such a cool and novel optimization for supporting immediates.

<OriansJ>Instead of just dropping a byte for encoding, supporting multiples of 16bits for instructions and having 48bit instructions to support 32bit immediates.

<siraben>I died at "massive grad student orgy"

<xentrac>yeah, the immediate encoding is pretty hilarious

<xentrac>it turns out that what's efficient to support in hardware is not always what's easy to generate code for in software (or decode in a software emulator)

<xentrac>the instruction set manual almost apologizes for what it describes as a "scrambled" encoding

<xentrac>OTOH it only has six instruction word layouts, which is more than Wirth-the-RISC or x18, but less than anything else

<xentrac>well, almost. 8080 and 6502 do beat it on that axis :)

<xentrac>but it's equivalent to, say, Z80, and simpler than 8086

<siraben>Z80 assembler in Scheme, anyone? https://github.com/siraben/zkeme80/blob/master/src/assembler.scm

<siraben>Runs with Guile

<siraben>other Z80 assemblers were so limited in their macro systems I gave up and wrote my own assembler

<xentrac>I don't think the optimizations are actually questionable; VexRiscv fits in just over 500 flip-flops and just over 500 LUTs, which I think is actually smaller than the J1A: https://github.com/jamesbowman/swapforth/tree/master/j1a

<xentrac>yeah, that's a good idea, siraben

<xentrac>that reminds me, you'd probably enjoy http://research.microsoft.com/en-us/um/people/nick/coqasm.pdf, siraben

<xentrac>if you haven't seen it previously

<siraben>Every time I try to read a Coq paper I feel so dumb hehe

<xentrac>OriansJ: I think the "grad student" in question was probably David Patterson, the co-author of Hennessy and Patterson, and the optimizations were validated by a series of full-custom chips in the 02011-02014 time frame

<siraben>Oh doesn't seem too bad, nice

<xentrac>yeah, I have the same problem in general ;)

<siraben>one day we'll get bootstrapped Coq, that'll be cool

<xentrac>well, Metamath is moving aggressively toward such a goal: https://arxiv.org/abs/1907.01283

<siraben>Right. What's the basis behind Metamath's logic anyway? Isn't it purely based on string-rewriting?

<xentrac>I think it's tree-rewriting, but I don't really know

<xentrac>I'm pretty ignorant about logic

<siraben>I'm taking logic this semester, interesting class

<xentrac>the crucial thing, though, is that the *verifier* is extremely simple, just verifying that the steps in the (potentially voluminous) proof, produced by whatever means, are valid

<siraben>Yes

<OriansJ>xentrac: well there is a big difference between optimizing for benchmarks and having a clean design.

<siraben>OriansJ: RISC-V would be the former?

<OriansJ>and Patterson clearly optimized for synthetic benchmarks

<siraben>or rather, x86 is 100% guilty of this

<xentrac>it's a compromise

<xentrac>why do you think the benchmarks the team optimized for were synthetic?

<siraben>What about ARM, IIRC it has a pretty uniform encoding right?

<xentrac>ARM1 yeah

<xentrac>although it still had things like LDM and STM IIRC

<OriansJ>siraben: arm7l and earlier had a simple immediate encoding option (thumb and thumb2 not so much)

<xentrac>and having condition code flags at all kind of makes ARM a lot more complicated than RISC-V

<OriansJ>and honestly if they just did their encoding for their instructions in big endian, I would have been singing its praises when bootstrapping.

<xentrac>(even in the versions where you don't have universal instruction predication)

<OriansJ>the 4bit condition code Isn't a bootstrapping problem at all

<OriansJ>where they put it is awkward

<OriansJ>but big endian bytes mixed with little endian words is gonna be ugly

<OriansJ>but I guess we can blame MOS 6502 for that

<xentrac>but arm7tdmi, for example, has 16 different instruction word layouts

<OriansJ>as ARM opted for little endian words because of it

<xentrac>but a lot of those instruction formats were added after the initial ARM design

<xentrac>it's true, though, that only one of those 16 chopped up its immediates into apparently random chunks scattered through the instruction word, the way RISC-V does

<OriansJ>xentrac: well studies done on human written assembly indiciated 12bits of immediate are sufficient for 80% of all immediates and 16bits for 95%

<OriansJ>I could deal with a 12bit immediate in M1 just fine

<xentrac>(this being the immediate form of the ldhr/strh/ldrsb/ldrsh instructions, which has an 8-bit offset broken into two pieces with some opcode bits between them)

<OriansJ>4bit shift and 8bit immediate also is easy in M1

<xentrac>yeah, I thought RISC-V's choice to separate loadhi/loadlo into a 20-bit high and 12-bit low was pretty surprising, but the RISC-V manual explained it in something like those terms

<OriansJ>'4' !9 => boom we are happy

<xentrac>M1 as in M1-macro?

<OriansJ>xentrac: yes, I am expressing bootstrapping encodings for immediates

<OriansJ>which are reasonable

<xentrac>yeah, definitely RISC-V's bit-swizzling is an extra pain there

<OriansJ>coldfire actually looks like they learned better

<xentrac>but it will pay dividends when you're wiring up discrete logic chips I suspect

<OriansJ> https://en.wikipedia.org/wiki/Freescale_ColdFire

<xentrac>what do you mean by "big-endian bytes"?

<OriansJ>01100111 => is encoded as either 67 or 76

<OriansJ>0x67 is big endian

<xentrac>what about 0xe6?

<xentrac>I'd say 0x76 is nibble-swapped

<OriansJ>xentrac: you are correct

<OriansJ>I didn't do the 76 right

<OriansJ>mostly because little bit endian reads wrong to me

<xentrac>generally speaking, though, in any computer since the 1960s, the bits in a byte are arranged along a separate dimension from bytes in memory

<xentrac>so the only time it comes up is when you're transmitting bits one at a time, like over SPI or RS-232

<xentrac>(bit-serial computers of the 1950s and early 1960s were invariably little-endian, because the alternative was to run an order of magnitude slower)

<OriansJ>xentrac: the sins we have commited in the name of performance in synthetic benchmarks

<OriansJ>I guess they are easy to miss for those who don't have to write the assembler to support that garbage

<OriansJ>Once you think about how to write an assembler in the very assembly language you wish to support. These sort of optimizations vanish like a fart in the wind.

<OriansJ>Shit all pre-C code assemblers when having to write an 8bit immediate did; this is 8bits long store8 reg, [address] => done; need to write a 32bit immediate? store32 reg, [address] => done.

<OriansJ>The only logic you might need is will this fit in 8, 16 or 32bits? and then picking the opcode that can handle that shit.

<xentrac>hardware doesn't work that way, though

<OriansJ>Not I'm gonna juggle these 8 balls in the air while I gargle this hardware "Architect's" nut sack to gain 0.0001% in a benchark showing you can load immediates into a register faster and then load another immediate without looking at it.

<xentrac>haha

<xentrac>yeah, that's not the deal

<OriansJ>or did you miss the Transmeta benchmark "hack"

<OriansJ>which simply optimized out the benchmark entirely and set records doing it

<xentrac>in synchronous hardware, which is almost all digital hardware, your clock rate is determined by the longest number of propagation delays to quiescence

<xentrac>it doesn't matter if the logic path that's two propagation delays longer is only used in 0.1% of all instructions

<xentrac>you pay the cost on every clock cycle

<OriansJ>xentrac: or you break it in half

<OriansJ>1 clock or 2 clock makes no difference to the programmer.

<xentrac>yeah, you may be able to do that, but that means you incur the cost of an extra pipeline stage

<OriansJ>xentrac: we have seen 40+ stage pipelines already

<xentrac>we're talking about RISC here, so the chip is designed to execute one instruction per clock cycle

<xentrac>yeah, you can do 40+ stage pipelines in something like a TPU, but they're useless for general-purpose processors

<OriansJ>did you not remember the Intel NetWorst which broke execute into 2 clocks then simply double clock that bit to make up for it?

<xentrac>that definitely was not what the RISC-V team was hoping to emulate ;)

<xentrac>the bizarre immediate encoding was prompted specifically by the desire to shorten the critical path for immediate-operand sign-extension logic, which apparently is a significant aspect of the critical path

<xentrac>with CMOS, the propagation delays get longer when fanout goes up

<xentrac>it's true that with a sufficiently complex design you can work around things like that with extra pipeline stages, complex speculation logic, and so on, but for many simple designs, that scrambled immediate encoding probably improves overall processor throughput by something like 10%

<OriansJ>xentrac: Implementation problems can be addressed without forcing the rest of the world put up with your crap.

<xentrac>and I don't mean on artificial benchmarks, but on all code, because it means you can clock the design 10% faster

<xentrac>yes, but there's a cost to addressing implementation problems like that

<xentrac>in the majority of ways RISC-V is an eminently boring design

<xentrac>in a good way, like C or Golang

<xentrac>there are a few places, like the immediate encoding, where they did bizarre things

<OriansJ>xentrac: execution trace cache solves the immediate internal encoding vs external immediate encoding rather quick

<xentrac>but most of those turn out to have been carefully considered bizarre things that were worth the cost

<xentrac>that's assuming you're doing a Pentium-style decoding into micro-ops in the front-end instruction decode in the first place, though

<OriansJ>xentrac: They are punting costs into the software side.

<xentrac>and such a front end would be larger than an entire RISC-V core!

<OriansJ>and that just makes bootstrapping even harder.

<xentrac>well, no; it makes the software part of bootstrapping harder, but it makes the hardware side easier

<OriansJ>xentrac: I'll believe it when I see a RISC-V in indivial logic gates on a printed circuit board.

<xentrac>the question is on which side it amounts to a larger amount of complexity

<OriansJ>xentrac: When I do Knight in Hardware, I'll experiment and let you know

<OriansJ>but until then we just have worthless speculation

<OriansJ>hell I've seen FORTH CPUs simpler than RISC-V without overly complicated immediate encodings.

<xentrac>I mean on the software side we're talking about something like `(imm & 0xfe) << 20 | (imm & 0x1f) << 7`

<OriansJ>So my feeling (and that is all it is at this point) is that it might make things faster but it sure as shit doesn't make things simpler.

<xentrac>well, as I pointed out above, I think VexRiscv is actually fewer LUTs in an FPGA than the J1A

<xentrac>but I could be wrong about that

<OriansJ>xentrac: as can I but it is late and I have work early tomorrow.

<OriansJ>good night

<xentrac>Robert Baruch hasn't finished doing it yet: https://hackaday.com/2020/11/09/the-logic-chip-risc-v-project-reboots/

<xentrac>goodnight!

<xentrac>but "a RISC-V in individual logic gates on a printed circuit board" does seem to be his objective

<xentrac>LMARV-1 being the project name

<xentrac>I think that as long as there's a minimal acceptable speed, making things faster in one place allows you to make them simpler somewhere else

<fossy>aww, cute, tcc is segfaulting /s

<fossy>chokes very very hard on .S files

***janneke_ is now known as janneke

<stikonas>gio: in the readme https://gitlab.com/giomasce/nbs/ you might want to replace boostrapping -> bootstrapping

<stikonas>fossy: I think my PR is now ready https://github.com/fosslinux/live-bootstrap/pull/19

<stikonas>yacc is now fixed and working

<stikonas>and shouldn't need any more changes

<stikonas>lex might possibly need some changes to build flex but we can patch it later. At the moment lex itself builds fine

<bauen1>so to make reviewing easier (or rather to make review "unecessary"), there should probably be an utility to hash data early in the bootstrap, this way you can remember the hash of some <data> and when you bootstrap again (e.g. after developing some changes) you check / run stage0 by hand, until you have your hash utility, then you use that to verify that the source still matches what you've used

<bauen1>last time

<stikonas>well, as an additional check I guess it's alright but it in no way guarantees that backdoor is not there

<bauen1>stikonas: how so ?

<stikonas>if hash is compromised?

<bauen1>stikonas: you still verify stage0 (i.e. hex monitor) up to your hash utility, you need to do this every time and it's therefor important to keep this short ; so unless you've misremebered your hash or messed up already i don't see a way to insert a backdoor

<stikonas>oh in that case yes

<stikonas>if you can verify up to hash

<bauen1>after this is done you can continue the bootstrap _without review_, up to the point where you need to apply your patches, which you review, at the end you can again build a merkle hash tree of your updated bootstrap and remember that

<bauen1>now there's still a problem, if you later figure out that you've included a bug or backdoor that makes it necessary to "revert" to an earlier version of the bootstrap, this could be solved by including the previous hash when cacluating the merkle tree, effectively forming a sort of blockchain

<bauen1>if you then have a copy of the merkle tree (which is comparably small, and can be verified by remembering the root) you can rebuild any version of the bootstrap without review

<bauen1>if someone inserts a backdoor and destroys your copy of the merkle tree you're kind of out of luck and i wish you fun with reviewing

<bauen1>this is kind of like reinventing how git works, but simpler

<bauen1>arguably you could also "prune" the merkle tree if it should ever become too big, at the cost of loosing cheap access to some earlier versions

<stikonas>yeah, I guess that might help with reviewing later...

<bauen1>it's still a problem that the entire source code you need to review before you have something you can "properly" develop with is absolutely massive

<stikonas>and you need to print everything for reviewing :D

<stikonas>you can't review on the screen

<bauen1>stikonas: yes you can, even for the first bootstrap

<bauen1>stikonas: verify stage0 by hand (i.e. paper + dip switches), all until you can use the screen to display code and continue your way, always computing the hash of components, so you have it easier later

<bauen1>stikonas: however you can't just use a linux desktop to review the kernel code, make a hash of it and then use that for bootstrap, the hash could be backdoored

<bauen1>actually let me dig up this discussion i had about printing information on paper and how much tons that results in

<stikonas>well, I guess you can always use some other screen if you enter code manually until you get screen...

<bauen1>ah yes, some shady math resulted in 1tb of binary data -> ~ 25 000 tons of paper

***ChanServ sets mode: +o rekado_

***rekado_ is now known as rekado

<ullbeking>good evening all

<ullbeking>just wanted to check in, say hi

<ullbeking>haven't forgotten about this!

<ullbeking>i hage a job interview on friday!

<stikonas>ullbeking: good luck!

<ullbeking>thank you stikonas !

IRC channel logs

2021-01-25.log