IRC channel logs

2021-09-11.log

back to list of logs

<stikonas>well, definitely not USB keyboard :D
<stikonas>that one needs quite a bit of code to get working, initial negotiation and then periodic polling, decoding signal
<xentrac>depends on your hardware
<xentrac>but usually not!
<xentrac>if you had to bitbang a serial interface to your paper-tape reader, the first-stage bootloader might be quite a bit more complicated too
<Hagfish>it's kind of cool to think about what could be implemented in 4-16 instructions, that's intriguing
<xentrac>like OUT $f00, $1; LOOP: IN $f01; JZ LOOP; ST (I0+); CMP $-1; JNE LOOP; JMP $0
<xentrac>if you have a blocking IN instruction it could be
<xentrac>like OUT $f00, $1; LOOP: IN $f01; ST (I0+); CMP $-1; JNE LOOP; JMP $0
<xentrac>which is 6 instructions
<Hagfish>wow, thank you
<xentrac>I mean that's a nonexistent assembly syntax for an imaginary machine I just invented in my head
<xentrac>with a single accumulator and a postincrementing index register addressing mode
<oriansj>xentrac: I find it quite odd people never consider the rather simple option of just building dedicated hardware when bootstrapping.
<oriansj>what is to stop one from simply implementing a tape reader/writer that follows the knight interface standard?
<oriansj>or atleast one close enough to be trivial changes in the early stages to go from POSIX to bare metal.
<oriansj>wire that shit directly to the RAM if one wants.
<oriansj>write this value to this memory address and wait until this memory address becomes zero; your value is now at this memory address.
<oriansj>spend a little more time and you get the 3 tape drives (ROM loader tape0, Input tape1 and Output tape2)
<oriansj>then you can write your kernel in the C subset that M2-Planet supports (or expand M2-Planet to the level required to support your kernel)
<oriansj>as all of those steps can be reduced down to single input and single output
<oriansj>stikonas: bootstrap-seeds merged.
<xentrac>yeah, that's what I outlined above in fake assembly
<xentrac>stikonas[m]: ↑
<oriansj>and stage0-posix updated bootstrap-seeds and merged.
<xentrac>wonderful!
<theruran>xentrac: this is what I was thinking of: "A pure vau-caluclus is also described (even more lightly) in Appendix C of the Kernel Report ("De-trivializing the theory of fexprs")."
<xentrac>hmm
<xentrac>wasn't Appendix C the appendix that describes the history of the letter vau?
<theruran>no?
<xentrac>"Appendix C: The letter vau", pp. 375-379?
<xentrac>maybe I'm looking at a different version of the dissertation?
<xentrac>btw I wrote up a simple term-rewriting system today
<theruran>in the Kernel Language Revised Report
<xentrac>oh hmm
<theruran> https://www.irccloud.com/pastebin/SAE2ND1L/
<xentrac>theruran: hmm, could be interesting
<xentrac>I just sketched out a term-rewriting interpreter that I think could maybe reach usability in a kilobyte or so of machine code. I don't think it can beat hex0 though :)
<xentrac>in http://canonical.org/~kragen/dernocua.git in file text/term-rewriting-micro-interpreter.md
<xentrac>theruran: you might be interested
<oriansj>the question is when (not if) someone is going to try to find the absolute minimal number of bytes required to implement hex0.
<oriansj>as there is considerable space remaining in potential size optimization (if not clarity)
<stikonas>well, not all ISAs are good for minimal hex0
<stikonas>probably want some low bit, definitely not 64-bit one...
<stikonas>hmm, unless we do extra work cc_* will be the last program to run on RV32I subset of Risc-V
<stikonas>M2-Planet uses multiplication, etc in a few places
<stikonas>that's probably fine though
<stikonas>if somebody really has RV32I only hardware, they can implement software multiplication in M2-Planet instead...
<xentrac>absolute minimal number of bytes is probably not ideal for auditability
<xentrac>probably if someone does want auditable hardware they wouldn't include a multiplier. that's a lot of transistors to save a pretty small amount of code
<stikonas>yes, I think most of those small risc-v cpus don't have multiplication
<stikonas>but what I mean, if somebody needs to work with such hardware, at that time they can juts patch M2-Planet and write that multiplication function in simple C...
<stikonas>hmm, actually cc_* also have a few divisions/multiplications
<xentrac>in Ur-Scheme I didn't implement multiplication or division
<xentrac>just (define (2* x) (+ x x)) and similarly 4*
<stikonas[m]>Well, multiplication by any constant integer is easy to do with bit shifts
<stikonas[m]>But two arbitrary variables need more code
<xentrac>Yeah, but I avoided having to do that, or implement bit shifts; I got away with just 2* and 4*
<xentrac>oh I'm wrong, I also implemented 10* and 8*
<xentrac>OTOH I didn't have to implement even the i386 ModR/M byte for Ur-Scheme, much less the noticeably uglier RISC-V instruction formats, because I sort of fobbed that off on the assembler
<xentrac>and I *did* implement quotient and remainder
<xentrac>and I used right shifts too, though only in assembly
<oriansj>stikonas: if you notice, ARMv7l doesn't have division or modulo; so in M2-Planet we just created a software routine and simply called it with with the arguments in R0 and R1 with the result returned in R0
<xentrac>I think reducing hex0 much further would probably involve refactoring its responsibilities so it doesn't have to open files and chmod things. your idea about just using memory buffers and invoking hex0 as a subroutine seems clearly correct for bootstrapping before a kernel
<stikonas>oh, so it's already done in M2-Planet...
<oriansj>So RISC-V can very trivially do the same thing for multiplication, division and modulo
<stikonas>well, maybe at some point I'll do it for cc_* too
<stikonas>although first I think I'll just write with hw division
<stikonas>and then it can be added on top
<oriansj>stikonas: actually I suggest doing: https://github.com/oriansj/stage0/tree/master/stage2/High_level_prototypes/cc_amd64 for RISC-V before you start doing it in assembly (to work out the basics first)
<stikonas>hmm, I was thinking of doing cc_amd64 in riscv first
<stikonas>and then maybe do C prototype when I port cc_amd64->cc_riscv64
<stikonas>btw, those prototypes need extra linker flags to build with recent gcc
<stikonas>need to add -Wl,--allow-multiple-definition to CFLAGS
<stikonas>well, it's actually LDFLAG but anyway...
<xentrac>if it can be postponed until after you have a C compiler, you could have one division subroutine for all platforms instead of one per platform. though as jessica clarke pointed out, ultimately the instruction set that matters for trusting trust is the one you have auditable hardware for, and I think her idea of doing an unjumbled version of RISC-V to make the binary code more auditable is
<xentrac>probably worthwhile
<xentrac>even if it does require a few more transistors and slow down the clock
<stikonas>jumbling doesn't make binary code that much more complicated
<oriansj>stikonas: it is cc_* in C for the architecture in question to work out the details ahead of time: as you can see here: https://github.com/oriansj/stage0/blob/master/stage2/High_level_prototypes/cc_armv7l/cc_core.c#L486
<oriansj>where the details of division are sorted out and the order details specific to the assembly
<oriansj>for example in ARMv7l it was label then instruction rather than instruction then label in the output for x86
<xentrac>it makes binary code much harder to read, but only somewhat harder to generate from a compiler
<oriansj>xentrac: I think stikonas demonstrated that although the jumbling is a pain, it isn't actually an issue in bootstrapping once we figured out the .hex .hex word solution.
<stikonas>and once you have hex1, it's only immediates that are a bit jumbled
<stikonas>and most common onces (I-type used in ADDI and LD) are not jumbled
<oriansj>and once you implement that in M0 and M1 it doesn't actually matter up the chain.
<stikonas>yeah, I can finally write final code directly now after M0
<stikonas>(will probably just skip GAS version)
<xentrac>oriansj: the .hex .hex word solution helps with generating the binary code, but not with auditing the binary seed. but I guess using B-type and J-type instructions sparingly helps with that
<stikonas>xentrac: well, it kind of helps with auditing too
<oriansj>the only thing that needs to be accounted for the immediate encoding is the need for two instructions rather than 1 in certain cases, in which case we can do this: https://github.com/oriansj/M2-Planet/blob/master/cc_core.c#L535
<stikonas>because we have prototypes
<stikonas>so you can take a look what word decomposes into
<stikonas>and check if it adds up to what you expect
<oriansj>xentrac: we can also do it with --binary instead of hex if it makes it easier to audit
<xentrac>that's true!
<stikonas>in any case auditing early binaries shouldn't take longer than writing them
<xentrac>the new hex0 has 5 J-type and 11 B-type instructions if I'm counting correctly
<stikonas>something like that
<xentrac>stikonas: yeah, but you may have to audit them more than once
<stikonas>I counted 13 branches and 6 jumps
<stikonas>in riscv version
<stikonas>but some of them are duplicate
<xentrac>I probably missed some
<stikonas>some of them identical e.g. both are jump 2 instructions forward
<stikonas>well, you can get rid of some of them if you want at the expense of more complicated code
<stikonas>e.g. hexifying part of the hex0 can be done with branchless programming
<xentrac>not sure whether that would make it more complicated or not
<stikonas>exactly...
<xentrac>maybe not
<stikonas>it's more complicated alrgorithm
<stikonas>but you avoid risc-v jumbling
<xentrac>right
<xentrac>if you dehexify something, are you exorcising it? sanctifying it?
<stikonas>uncursing?
<xentrac>uncursing :)
<xentrac>btw, I mentioned your work on this as an inspiration in text/term-rewriting-micro-interpreter.md in http://canonical.org/~kragen/dernocua.git
<xentrac>which is still kind of a sketch
<stikonas>well, you can't have something 100-bytes on POSIX...
<xentrac> https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html is 45 bytes
<stikonas>elf header is 120 bytes
<xentrac>you'd think
<stikonas>and you need at least 2 instructions 2 exit, so I would think 128 bytes
<xentrac>I don't think POSIX is so much the obstacle as ELF, but even ELF turns out to be more flexible than that
<stikonas>hmm, that's probably some older non-elf format?
<stikonas>a.out I guess
<xentrac>nope
<xentrac>it's ELF
<oriansj>xentrac: well there are probably headerless executable formats for POSIX that we can use as nothing in the lower stages demands ELF
<xentrac>last time I talked to Brian Raiter about it, it even ran on current Linux
<xentrac>POSIX doesn't really define ABIs in general; that wouldn't be PO
<oriansj>perhaps a .COM format of sorts that is supported by BSDs and Linux
<stikonas>anyway, elf header is closer to metadata than code
<xentrac>I don't think there is such a thing. well, except for .COM itself, which is supported by various DOS and Windows emulation stuff
<stikonas>well, that 45 byte "program" kind of uses part of elf header as "code"
<oriansj>xentrac: what sort of effort required would it take to add a new executable format to the BSDs and Linux?
<xentrac>yes, it packs the code into an unused part of the ELF header
<oriansj>like 4bytes Magic header: dump rest into memory at a fixed address, RX permissions only
<stikonas>I doubt it makes sense to introduce new executable format...
<oriansj>stikonas: generally agree here but I am curious
<stikonas>elf header is much easier to inspect than rest of the code
<xentrac>oriansj: if you can put an interpreter in the root directory, you can get by witha 5-byte magic header: #!/x
<stikonas>most of early programs use almost identical headers
<xentrac>well #!/x\n
<xentrac>(and if you can't put things in the root directory, loading a kernel module is going to be a much more significant difficulty)
<oriansj>xentrac: no other binaries besides kaem-optional and hex0
<xentrac>not even the kernel?
<stikonas>you have kernel
<stikonas>otherwise it makes no sense to talk about headers
<xentrac>yeah, that's what I was thinking
<xentrac>but the kernel is... noticeably larger than kaem-optional
<stikonas>you just have code at CPU entry address without kernel
<oriansj>well stage0-posix is the smallest set of userspace bootstrap pieces, stage0 is the bare metal stuff and it is the exact same steps with much more complex hardware requirements.
<oriansj>hence why I am figuring out the minimal bootstrap filesystem that could be used to bootstrap a proper POSIX kernel and go the rest of the way to Linux+GCC+Guile (needed for Guix to do the rest)
<xentrac>if you're willing to depend on kernel modules, you could just load a Scheme interpreter into the kernel
<stikonas>even fossy's linux kernel is compiled without kernel modules...
<oriansj>xentrac: and then just directly run MesCC+slow-utils and be done
<xentrac>exactly
<xentrac>(or other arbitrary kernel code; you can compile a Scheme interpreter in statically just as easily)
<stikonas>well, putting interpreters, etc into kernels is a bit of cheating
<oriansj>stikonas: not a bit, entirely cheating.
<stikonas>kernel functionality that is used should just be bits that abstract out hardware, i.e. reading from files rather than some tape, etc...
<oriansj>it is kinda like saying the smallest hello world is a precompiled binary named a
<xentrac>echo '#!/bin/echo' > hello; chmod 755 hello
<xentrac>anyway, stikonas, in non-cheating land, I think your idea of invoking hex0 as a subroutine to transcode bytes from one memory buffer to another is an excellent one
<stikonas>it's not really my idea...
<stikonas>that's just how bare metal programs run...
<oriansj>there are a great meany details in bare-metal bootstrapping still needing to be worked out.
<stikonas>there are no processes there
<xentrac>some bare metal programs interact with I/O devices
<oriansj>with a couple tape drives everything from hex0 to M2-Planet have been sorted out but after that we need a filesystem and a kernel
<xentrac>we don't really need a kernel
<xentrac>just a sort of monitor thing that lets us load one program after another
<oriansj>xentrac: the step after M2-Planet is MesCC and yes it absolutely needs a POSIX kernel
<stikonas>well, there are some intermediate tools before mescc but yes, if we count just C compilers...
<xentrac>you said it only needs exit, execve, fork, waitpid, brk, open, close, read, write, lseek, chmod, fchmod, access, chdir, fchdir, mkdir, mknod, getcwd, umask, uname, unlink, ioctl, stat and fsync
<xentrac>IIRC
<oriansj>mescc-tools and mescc-tools-extras
<oriansj>then after MesCC, we have TCC and its requirements
<oriansj>but then we can build a Linux or BSD possibly
<xentrac>and IIRC it uses fork+exit+waitpid in a stereotyped way that doesn't actually require multiple concurrent processes
<oriansj>xentrac: well we did try to make it into a solvable problem ^_^
<xentrac>and ioctl is just isatty
<oriansj>but it needs to be something that M2-Planet can compile
<stikonas>yeah, ioctl can be easily patched out
<xentrac>not sure what tcc requires. not much I'm guessing
<oriansj>xentrac: well expect the list of requirements to grow as we discover what assumptions of ours were wrong.
<xentrac>also I think probably chmod, umask, and access there are not actually providing functionality; they're just leaping hurdles POSIX puts in your way
<stikonas>well, tcc probably is not too bad
<stikonas>after all somebody had tccboot project
<xentrac>heh, good point
<oriansj>compilers need less than interpreters
<xentrac>tcc evidently doesn't require a kernel to generate runnable code successfully
<stikonas>yeah, most of those syscalls are either POSIX hurdles or dealing with file system/processes which is irrelevant for baremetal
<oriansj>So MesCC and Bash are the two with the most of the requirements
<xentrac>hmm, does djgpp still exist? does it include bash?
<stikonas>yeah, bash definitely needs kernel
<xentrac>because djgpp runs without a kernel
<xentrac>anyway so I don't think you need a kernel, just a filesystem and some way of returning control to a monitor
<stikonas>well, returning control is simple
<stikonas>that's exactly the same as you return from functions inside your program
<stikonas>e.g. store return address in some register
<oriansj>xentrac: well kaem-optional is the monitor and a kernel that just provides a handful of syscalls like a runtime library in a separate memory space
<stikonas>or in some predetermined memory lcoation
<xentrac> https://en.wikipedia.org/wiki/DJGPP makes it sound like it's not actually dead; it includes GCC 9.3.0
<stikonas>yeah, without kernel need to be more careful with memory partitioning
<stikonas>and manage registers better (kernel zeroes them before program is started)
<oriansj>kernels seem like a better option when one isn't short on RAM and have a working MMU in hardware
<oriansj>if not Amiga EXEC https://en.wikipedia.org/wiki/Exec_(Amiga) shared functionality library at fixed memory address is a better idea
<stikonas>we don't even need multitasking...
<xentrac>multitasking is easier than memory protection
<oriansj>stikonas: true but the idea is a kernel just being a library at a fixed memory address that one uses
<xentrac>but even memory protection is not rocket science on RISC-V
<oriansj>well MMUs are not free in terms of transistors but they are probably worth it
<xentrac>they definitely make debugging a lot less painful :)
<xentrac>I've been noodling for a few years about master-slave processors as an alternative to MMUs
<xentrac>run your "kernel" on a master processor and your user processes on a slave processor; the master has private RAM and also a link to the slave's RAM, so it can read and write it as it wishes, and can also pause the slave or reset it
<xentrac>this is a simpler design than the traditional MMU approach, because the master and slave processors are identical; the only difference is that the slave's reset and pause pins are connected to I/O pins on the master, and the slave doesn't have a link to the master's RAM. and it provides stronger isolation with no risk of things like rowhammer or spectre. the downside is that you need twice as
<xentrac>many CPUs, and context switches involve rebooting the slave with new code
<xentrac>in terms of verifiability, verifying that the slave CPU is identical to the master might be a lot easier than auditing an MMU
<xentrac>other potential advantages to such a system include potentially higher performance with multiple slaves, since each slave has full-speed access to its own memory, and better scalability of power usage, since a slave that doesn't have a task to run can be turned off, but these are probably irrelevant to bootstrapping builds
<xentrac>(SMP systems with N processors can potentially also have N times the memory bandwidth, but this usually involves some kind of crossbar switch in between the CPUs and RAM, so larger "SMP" systems resort to NUMA)
<oriansj>well if one requires a PID register (say only use the bottom 8bits when bootstrapping), then one needs only 256 pointers worth of Memory and a finite state machine to implement a proper MMU. Which is probably much cheaper in terms of implementation parts than multiple CPU cores
<oriansj>I don't disagree that SMP has many potential advantages but I don't think they add much in terms of bootstrapping.
<oriansj>which is why in the original transistor constrained history that it was the winning solution that everyone adopted.
<xentrac>yeah, an MMU is definitely less transistors than a second CPU! but they're harder to verify
<oriansj>xentrac: well you would have to verify BOTH CPUs even if they are identical in design.
<xentrac>To do change detection of two CPU die photos you can print one in red and one in transparency in blue, then lay it on top of the red one
<xentrac>well, cyan, not blue
<xentrac>then spotting differences between them becomes trivial and obvious
<xentrac>also, if youhave a third-party supplier, the supplier doesn't know which one you're going to use as master and which one as slave
<oriansj>xentrac: assuming the CPUs are single dies yes but not so much if they are wire wrapped CPUs
<xentrac>agreed, wirewrap is not auditable
<xentrac>by any means, not just with die photos
<oriansj>xentrac: well a good plan needs also to make sense if someone in their garage were to have to make the hardware themselves
<xentrac>did you see Sam Zeloof's recent update on making chips in his garage?
<oriansj>yep
<oriansj>clever little trick to up the transistor count
<oriansj>but I doubt a proper 32bit CPU can be done in that current transistor limit
<xentrac>agreed, though I think he's nudging up against the 3500 transistors of a 6502
<oriansj>So PDP-1 style CPU building is probably a solid idea
<oriansj>but yeah in the future assuming progress, a simple 32bit CPU will certainly be possible
<oriansj>and will probably benefit from your master/slave suggestion.
<xentrac>and the MuP21 was a 21-bit processor in 7000 transistors, though a lot of those were in the NTSC generation hardware
<xentrac>heh, if you think the RISC-V instruction encoding is bad, you should check out the MuP21's!
<oriansj>well Berkeley's RISC II was 39K transistors with a good chunk of them allocated to Registers, which could probably be shaved off
<oriansj>say go from 138 registers down to just 16
<xentrac>yeah :)
<xentrac>that was already a reduction from RISC-I
<xentrac>it wouldn't be surprising if a minimal RISC-V was smaller than RISC-II, too; a lot of the architectural decisions in RISC-V had to do with simplifying minimal implementations
<oriansj>so assuming 6transister per bit in the register, it would be a 18304 transistor reduction
<xentrac>nice
<xentrac>I think the reason for the large register files of RISC-I and RISC-II was the sliding-window mechanism we know from SPARC
<xentrac>which in a sense is a substitute for a data cache
<xentrac>but a thing to keep in mind is that, with semiconductor memories, your RAM accounts for a lot more transistors than a simple CPU like this
<xentrac>I mean that's why weird memory technology like magnetic drums, acoustic delay lines, Williams tubes, and magnetic cores were crucial to computers from 01945 to 01975
<xentrac>because it was totally impractical to build RAM out of vacuum-tube flip-flops
<xentrac>to run something like a BASIC interpreter from RAM, you need about 4 KiB. you might be able to do it in 2 KiB with a stack-machine instruction set, but 4 KiB is the usual minimum. if that's DRAM it's 32768 transistors and 32768 capacitors
<xentrac>(plus the column and row drivers, but that's probably only another 1000-4000 transistors, and the proportion goes down as you go to larger arrays)
<oriansj>yeah, fortunately duplication works well when makes DRAM/SRAM chips
<oriansj>^makes^making^
<oriansj>as I don't want to deal with hand making core memory
<oriansj>but core memory modules can be bought on Ebay at cost
<xentrac>right. so my thought is to take the same approach to making CPUs, though not to the same extent as the GA144
<xentrac>oh, that's interesting! I didn't know that!
<xentrac>I assumed they were sort of rare artifacts by now
<oriansj>at about $50 per KB; so about $50K for 1MB
<xentrac>(since the total number of computers that ever used core is probably somewhere in the neighborhood of half a million, and most of them have been scrapped by now)
<xentrac>so for US$200 you could have 4 KiB
<oriansj>plus new core memory is being made for legacy systems
<xentrac>about the same price as JLCPCB would charge you for assembling a DRAM out of discrete transistors and capacitors with a pick-and-place machine
<xentrac>seriously? who's making new core memory?
<oriansj>mostly military contracts if I remember correctly
<oriansj>core memory survives nuclear blasts far better than CMOS DRAM/SRAM cells
<oriansj>ironically it was uranium run off contamination in the manufactor of IC packaging that resulted in the discovery of radiation causing bit flips in ICs for Intel
<xentrac>interesting!
<oriansj>1978 paper "A new physical Mechanism for Soft Errors in Dynamic Memories"
<xentrac>still, I imagine SRAM has much higher immuninty to that kind of thing, even if it's CMOS rather than TTL or ECL
<xentrac>actually maybe *especially* if it's CMOS
<oriansj>there are many tricks for RAD hardening of systems and the military has the money to buy the best available
<oriansj>NASA gets discounts on Radiation hardened chips as a minor side benefit
<xentrac>*immunity
<oriansj>anyway. Back to the previous topic of garage build Systems as the final step of a trusted bootstrap. It'll be a good while before the transistor count gets high enough for CPU on Chip and thus not betting on it at this stage is a reasonable plan until such time that evidence suggests changes.
<oriansj>As people seem more than happy with usespace bootstrapping being ported to more architectures, rather than people wanting to dip into bare metal bootstrapping work.
<xentrac>sounds reasonable. I don't think it'll be the final step, though
<xentrac>there are probably fewer hardware engineers than software engineers, and because hardware can't be freely copied, hardware engineers don't have the free-software traditions
<oriansj>xentrac: I hope we can bring a libre hardware design tradition to the hardware engineers with things like print your own CPU at home sort of fun
<oriansj>along with libresilicon providing better standards for the industry
<oriansj>so that foundries can provide a universal standard product to compete on price alone
<xentrac>libre hardware design requires matter compilers or matter replicators
<Gooberpatrol66>can't you make FPGA soft processors?
<xentrac>yes
<xentrac>Bunnie has written about the difficulties that approach would pose to would-be attackers who seek to compromise the hardware
<xentrac>(FPGA soft processors)
<stikonas>Gooberpatrol66: there are alreaydy some FPGA soft processors, but not sure about open/libre silicon FPGA processors
<xentrac>stikonas: there are lots of freely licensed soft cores, but I don't think there are any freely licensed FPGAs
<stikonas>probably...
<oriansj>I don't believe anyone has made a libre FPGA hardware (let alone for sale) the closest I have seen has been the iCE FPGA which we only reverse engineered the bitstream format
<stikonas>also FPGA toolchains...
<oriansj>stikonas: icestorm
<xentrac>there are CPLDs for which the mask work rights have expired
<oriansj>So we can do free hardware designs for FPGAs but that doesn't prevent attacks that compromise the bitstream generation process.
<xentrac>but a CPU softcore would probably require a lot of CPLDs
<Gooberpatrol66>i've heard of this https://symbiflow.github.io/
<oriansj>as one needs trusted CPUs to run a trusted bootstrap to build a trusted bitstream for a softcore CPU
<xentrac>not necessarily. I mean you can verify a CPLD bitstream by hand
<oriansj>assuming we want to address Nexus Intruder style attacks (hardware compromising softwware to compromise hardware cycles)
<stikonas>aren't bistreams quite big?
<oriansj>depends entirely on want is being built
<xentrac>also depends on the device
<oriansj>a single bit processor could be small enough to hand audit with time/money to do so
<stikonas>but I mean verifying bitreap by hand is probably harder than verifying gcc binary
<xentrac>Altera introduced the Max5000 in 01988, and in the US, mask works are only protected for 10 years
<xentrac>and the maximum duration contemplated for mask-work protection in TRIPS is 15 years
<xentrac>so any CPLD or FPGA produced before 02006 is fair game to copy
<xentrac>stikonas: a CPLD bitstream would be a lot easier to verify than a transistor design or even a netlist of gates, much less a physical PCB full of chips. but of course you also have to verify the CPLD itself
<oriansj>so we have options to potentially explore but nothing solid until someone is willing to put in a good bit of work.
<xentrac>the sum-of-products expression of a PAL, PLA, PLD, or CPLD is a lot easier to understand than an arbitrary expression of NAND gates or whatever
<xentrac>/GAL
<oriansj>FPGAs probably will be like the user space bootstrapping work. It hides a big potential risk (like the kernel) but is easy enough to get into that more people would be willing to work in it than working in individual gates
<stikonas>oh ok, CPLD are simpler than FPGAs...
<xentrac>CPLDs are definitely less capable than FPGAs
<stikonas>never heard of them before...
<xentrac>I think bitstreams for them are also easier to analyze. CPLDs also have the enormous advantage that their designs are public
<stikonas>I've only used FPGAs before
<xentrac>FPGAs are actually a little older than CPLDs; the XC500 is from 01985. but PLDs in general (including FPGAs, PALs/GALs (01978), PLA, and CPLDs) are from 01971
<xentrac>I had thought FPGAs came after CPLDs
<stikonas>well, I just never encountered CPLDs before but we use some FPGAs at work
<xentrac>I think Lattice has mostly squeezed CPLDs out of the market with their cheap FPGAs
<xentrac>Digi-Key has 151 CPLD models in stock https://www.digikey.com/en/products/filter/embedded-cplds-complex-programmable-logic-devices/695?s=N4IgjCBcoLQJxVAYygFwE4FcCmAaEA9lANrggC6AvvjAEyIgqQY75GSkCsFl1IAbAwCWAEyggYYAAz18AB1TiQ%2BAI6oAnuIj4Nc7OJEBnFLyA
<xentrac>but 508 FPGAs https://www.digikey.com/en/products/filter/embedded-fpgas-field-programmable-gate-array/696?s=N4IgjCBcoLQJxVAYygFwE4FcCmAaEA9lANrggC6AvvjAEyIgqQY75GSkCsFl1IAbAwCWAEyggYYAAz18AB1TiQ%2BAI6oAnuIj4Nc7OJEBnFLyA
<xentrac>but you can see that the cheapest FPGAs are like US$1.90 while the cheapest CPLDs are like US$1.50
<oriansj>add to notes for when we actually start doing that work (unless someone else beats us to it)
<oriansj>a full breakdown of time spent in qemu for every step in the x86 stage0-posix bootstrap on a raspberryPI https://paste.debian.net/1211381/
<oriansj>hex2 (written in hex1) is absolutely the biggest reason for the slow run time
<xentrac>makes sense
<oriansj>in fact it is the only bit that takes more than 30 seconds
<oriansj>the second slowest is cc_x86 building M2-Planet in 0:16.99
<oriansj>and then third is M2-Planet self-hosting in 0:12.10
<oriansj>and ./hex2-0 hold M2 in 6:31.76 and ./hex2-0 hold hex2-1 in 4:05.36 are 5/9ths off the entire time
<xentrac>how much RAM does it need?
<oriansj>need or currently uses?
<oriansj>need is about 16KB but uses is 8MB
<oriansj>as M2-Planet shoves the entire input source code and the entire output into memory before doing one big dump
<oriansj>blood-elf is even more wasteful right now
<xentrac>makes things easier
<oriansj>M1 deduplicates the tokens, so that it has an O(1) for the application of defines
<oriansj>So M1 only has O(n) unique tokens stored in memory (including immediates)
<oriansj>as %1 %1 %1, would only appear once in M1
<xentrac>right
<oriansj>I could probably break out the 3 functions that would need update into a separate file and produce a minimal memory version without much trouble; which should fit in 256KB
<oriansj>would have to add more optimizations into M2-Planet (or a seperate optimizer) to shrink things down smaller than that
<oriansj>as the current M2-Planet+M2libc binary is 204024bytes in size (and you need 3 4KB input token buffers to build mes-m2)
<stikonas>oriansj: well, hex2 is probably stuck in mprotect syscalls
<stikonas>you can backport my fix from risc-v
<oriansj>stikonas: actually I was confirming that because M0 has that same work as your RISC-V
<stikonas>actually that's how I noticed the problem
<stikonas>I noticed that M0 was fast
<stikonas>while hex2 was slow
<stikonas>but I think everything with section table would be fast
<stikonas>so I think writing to heap before blood-elf will be slow on qemu
<stikonas>by switching to stack I was seeing speedup from 8s to 0.5s or so
<stikonas>and I didn't even fix everything, only fputc/fgetc, I still left scratch in heap
<xentrac>I just pushed an update to http://canonical.org/~kragen/dernocua.git with the notes about term rewriting in text/term-rewriting-micro-interpreter.md
<oriansj>odd, looks like GDB's behavior has changed in regards to disassembling memory
<oriansj>symbol table is valid according to readelf and objdump -d has no problem with the file
<oriansj>so why does GDB complain that there is no symbol table and refuse to show the assembly instructions???
<oriansj>oh, I now need to also do layout asm along with layout regs
<oriansj>I'm dumb