IRC channel logs

2021-06-11.log

back to list of logs

<oriansj>stikonas: well compromised hardware that subverts operating systems and compilers and future hardware designs was the Nexus Intruder Program against the Soviet Union during the Cold war. Which presents some unfortunate properties since computers are required in the manufacture of masks used in the lithography process.
<xentrac>also there is a lot of snowden stuff that suggests that kind of thing is still going on
<xentrac>also of course
<xentrac>crypto ag
<oriansj>And humans can only manually manufacture lithography masks by hand down to the 3 micrometer process using Rubylith
<oriansj>after that everything was done in software
<xentrac>i think that's more an issue of layout complexity and improved software than of process incompatibilities
<Hagfish>i'm wondering if the nearest equivalent to DDC for compromised hardware is running the system you want inside a virtual machine on the hardware
<Hagfish>the compromised hardware would need to be clever enough to spot the virtual machine and pass on a payload to it to make it target the actual calculation you were trying to subvert
<Hagfish>of course, for many types of virtualisation, the hardware can see the software on the guest machine with hardly any indirection at all
<Hagfish>but you could write a custom emulator (and custom operating system?) that the malicious hardware wouldn't know to expect
<Hagfish>i'm sort of hoping that after sufficient number of levels of indirection, you eventually hit an information-theoretic limit about how much "hidden" code the malicious hardware would have to be hiding to drill through all the layers
<Hagfish>another approach would be something like homomorphic encryption and/or splitting the process across multiple machines, so that no one machine has enough information to know how to exploit the actual operation you are trying to do
<xentrac>yeah, FHE is hypothetically an extremely powerful primitive, but in this case I don't think you actually need oblivious computation
<xentrac>because no secrets are involved
<xentrac>it's sufficient that no one machine can execute any code incorrectly without being detected
<oriansj>xentrac: I don't think Homomorphic encryption would directly help but it does provide a mechanism for untrusted computers to not know what they are computing. Assuming there isn't some secret shortcut baked into the encryption.
<oriansj>hopefully the field of Verifiable computing might save us the trouble of figuring out that solution.
<oriansj>but personally I think it is more fun to explore the designing our own hardware, even if it isn't a huge advantage in any terms besides more control and no hardware secrets.
<xentrac>at some point FHE might help, but it isn't immediately obvious to me how
<xentrac>I think "a mechanism for untrusted computers to not know what they are computing" sells it short, though; I don't think exactly how is really on topic, but I'd be glad to explain if you'd like
<oriansj>xentrac: the how of Fully Homomorphic encryption? nope but some alternate possible solutions to making hardware compromises easier to detect or even more complex to implement would be helpful.
<xentrac>(not how to do FHE, but how that description sells it short)
<xentrac>making hardware compromises easier to detect, well, right now we have the problem that you need an electron microscope and the willingness to destroy your hardware in order to detect them
<xentrac>I think the solution to that is matter compilers, but maybe 02021 is too early for that to be a useful answer
<oriansj>circuits on glass or transparent plastic might be useful
<oriansj>although 50Mhz might be too slow for certain operations
<xentrac>50MHz is a lot better than the Pineapple One's 0.5 MHz
<xentrac>matter compilers will be a real paradigm shift though. transistor density increases not only because of Dennard scaling, which is already dead, but because every square millimeter of silicon patterned and etched and doped as finely as we can manage is fucking expensive
<xentrac>so for example the Graviton2 has 30 billion transistors using a "7nm" process. if they made it using a 32 nm process from 02009 it would be just as fast but it would be 25 times as big and 25 times as expensive
<xentrac>but a lot easier to cool
<oriansj>doubt it would have the same clock speed though... and probably consume a great deal more power
<xentrac>no, it would consume almost the same amount of power, and it could be clocked almost exactly as fast; Dennard scaling came to an end before that
<xentrac>you might have to redesign it with a larger number of distinct clock domains, though
<xentrac>by contrast, a human brain has tens of trillions of dendritic spines amplifying signals to about 86 billion neurons
<xentrac>and it can do things the Graviton2 can't yet, despite running 30 million times slower
<xentrac>the reason the Graviton2 and other modern chips are made so small is that it reduces their cost
<xentrac>because we're still manufacturing them with engineering processes that are merely unbelievably polished versions of the engineering processes that designed WWII radars
<xentrac>well, I should say "manufactured WWII radars"
<xentrac>part families, numerical analysis, precision, tolerances, simplicity, bearings, lubrication, purity, capital investment, traceable metrology, elimination of variability, and so on
<xentrac>yields!
<xentrac>none of that has anything to do with how human brains, mahogany trees, or lichens are manufactured. except yields, I guess. nobody wants their children to die
<xentrac>suppose you have a matter compiler that can only produce transistors with a 10μm gate length, like chips in 01971, so your logic gates have 20 ns response time like a CD4000, or worse, like 50 ns, but you can produce as many of them as you want.
<xentrac>and let's say these transistors and their associated wiring are 30μm × 30μm × 30μm.
<xentrac>what would you do with them?
<xentrac>consider compiling a cubic meter of sand or granite into a cubic meter of this lame computronium
<xentrac>this gives you 37 trillion transistors, 1000 times as many as the Graviton2, but only capable of running at maybe 1MHz
<xentrac>and — this is the crucial part — for the price of a cubic meter of sand plus a few kilowatt-hours of energy
<xentrac>(actually I just checked Intersil's datasheet and the CD4000's propagation delay is 60ns)
<xentrac>the problem becomes one of how to get any use at all out of the unlimited supply of free, slow transistors, rather than how to jam as many of them as possible into extremely expensive real estate and get them to run as fast as possible
<oriansj>xentrac: not exactly helpful for the root of trust problem we have but I can understand the interest in such a technology. Especially when you consider the grey-goo explosion in productive capacity properties it contains.
<xentrac>it's not clear how to solve the root-of-trust problem in that environment at all; the explosion in productive capacity means that everything around you could be a fake designed to trick you, like the TikTok "everything is cake" memes
<xentrac>but at least it's not centralized like TSMC? :)
<NieDzejkob>welp, managed to crash miniforth before I could bootstrap to the point where I save stuff to disk
***terpri is now known as robin
<xentrac>:(
<NieDzejkob>I got analog screenshots on my phone though, so it's just a matter of retyping about 2kb tomorrow
<xentrac>fantastic!
<xentrac>yeah, having some kind of interrupt-into-monitor functionality is really helpful for that kind of thing
<xentrac>(and ideally invoking it on certain interrupts)
<NieDzejkob>I wrote a nice assembler and used it to make some primitives - branches and comparisons
<xentrac>nice!
<xentrac>is this on an old PC from the attic?
<NieDzejkob>then I wrote IF, THEN, ELSE, BEGIN, AGAIN, UNTIL... and when I tried testing the looping words I accidentally underflowed the stack
<NieDzejkob>yeah, it's got like, a 800 MHz VIA processor and 128 megs of RAM
<xentrac>ah, so it can do 386 protected mode, but it's initially running an IBM BIOS?
<xentrac>in 16-bit mode
<NieDzejkob>: foo begin dup u. 1 - 0= until ; 5 foo <- spot the mistake
<xentrac>I mean, not an actual IBM BIOS, but like a clone from Phoenix or AMI or something
<NieDzejkob>yeah, that's right
<NieDzejkob>no uefi
<xentrac>but also no openfirmware
<xentrac>and no page faults
<NieDzejkob>ooh, that bios that's forth inside. no.
<xentrac>the initial monitor doesn't have to be super fast
<NieDzejkob>I'm pretty sure it supports paging, I used to run linux on that thing
<xentrac>right, but it's not enabled initially
<xentrac>so you can't get interrupts out of a stack underflow
<NieDzejkob>hell, I looked up the southbridge manual and it has hardware video decoding
<xentrac>maybe you could have a machine-code subroutine that checks the stack pointer, aborts to the forth prompt with an error message on stack underflow, and otherwise pops the stack and leaves it in ax or something?
<xentrac>in a forth designed for speed you might not want that to be your only form ever of popping the operand stack, for speed reasons, but maybe for interactive bringup it would be a worthwhile tradeoff
<xentrac>are you using si or something for the operand stack?
<NieDzejkob>it actually stopped after only a few lines of output, probably because it popped off some important code and overwrote it
<xentrac>yeah
<NieDzejkob>sp is parameter stack, di is return stack
<xentrac>interesting
<NieDzejkob>though, re: error message, you're overestimating the free space I have :P
<NieDzejkob>I actually finished my blogpost about the internals yesterday, perhaps you'll find it interesting: https://niedzejkob.p4.team/bootstrap/miniforth
<NieDzejkob>anyway, I don't think fitting in an underflow check would have a big ROI
<NieDzejkob>I didn't really even need loops or conditions to write stuff to disk, I guess you could say I flew too close to the sun :P
<xentrac>haha
<xentrac>this probably won't be the last time you crash with a stack overflow
<xentrac>or underflow, but it's the same problem
<xentrac>btw you can probably save yourself a bunch of code space by keeping the top of the operand stack in ax instead of in memory, though maybe you've already done that and just didn't put that in this post
<xentrac>oh, you are, you're just using bx instead of ax
<NieDzejkob>I did say that, but perhaps in a not very visible place
<xentrac>i just misread
<NieDzejkob>or you're still reading and you've just seen one of the intermediate snippets
<xentrac>do you know about INT $1B?
<NieDzejkob>yeah, I don't think this will be my last crash, my strategy is to just make unscheduled reboots painless
<NieDzejkob>ah, ctrl+break. yeah, I do
<NieDzejkob>at least infinite loops that don't break the stack could be recoverable that way
<NieDzejkob>hmm, I could just point the handler at ReadLine and put an sti there, I think?
<xentrac>i think that's right. you might need to twiddle sp a little
<xentrac>also do you know about the single-byte int 3 instruction debug.com uses for single-stepping?
<NieDzejkob>yup
<NieDzejkob>twiddle sp? I think I can leave that to the user
<xentrac>yeah, possibly
<xentrac>might be handy to initialize ram with 0xcc to provoke an interrupt upon a jump into hyperspace
<NieDzejkob>hyperspace :D
<xentrac>so that you get a prompt before the machine state gets mangled too badly
<xentrac>you know, NieDzejkob, these interrupts just gave me an evil though
<xentrac>t
<NieDzejkob>yeah?
<xentrac>a call instruction to a literal address is 3 bytes, but int $3 is only 1 byte, so if you have a subroutine you call 10 times, you can save 20 bytes of code by putting its address at the interrupt 3 vector
<xentrac>except you also have to put its address there of course, and it has to return with an iret instead of a ret (still 1 byte)
<xentrac>even the other interrupts like 0x20, 0x21, 0x22, and so on can be invoked with only 2 bytes instead of 3
<NieDzejkob>the crossover point seems to be 5 calls
<xentrac>also, if you put your operand stack at si instead of sp, you can still pop from it with a 1-byte instruction (although as I said maybe you want to check for underflow), but pushing then requires 4 bytes; vice versa if you use di instead
<NieDzejkob>my most called routine, PutChar, is only called 4 times
<xentrac>but maybe you could use the int3 trick to do pushing (or popping) and make them each just 1 byte
<NieDzejkob>I don't get why you'd want to change it to not use sp for operand stack
<xentrac>so you can use sp for the return stack
<NieDzejkob>but the return stack is used, like, twice
<NieDzejkob>literally only DOCOL, EXIT, >R and R>
<xentrac>yeah, it would only make sense if you were remodeling it into a subroutine-threaded forth, which gets rid of NEXT
<xentrac>but might not pay off
<xentrac>so maybe it's a dumb idea, I don't know
<xentrac>it must be frustrating to have put so much effort into writing such an excellent explanation of miniforth as the one you linked above and then have me only dimly understand it and throw out a bunch of ideas about it half of which are obviously wrong :)
<xentrac>eliminating NEXT might not save as much space as I'd naively hope since it doesn't eliminate the following dictionary entry
<oriansj>NieDzejkob: I think I have an old IBM 5150 which might be of some use to you
<NieDzejkob>xentrac: no, it's actually a pretty interesting idea
<NieDzejkob>I don't think it will actually save bytes, but I'll have to think more about it when I'm more awake
<xentrac>if it was possible to get the per-word overhead low enough to eliminate the decompressor, that might help
<NieDzejkob>oriansj: a 5150 sounds like a great thing to have. I'm not sure I would actually have a good use for it, though
<xentrac>glad the ideas are enjoyable!
<NieDzejkob>I mean, the compression has a net positive effect
<NieDzejkob>just realized: pushing and popping with lodsw and int3 won't let you control the register into which you pop
<xentrac>right, of course the compression has a net positive effect! but that's in part because NEXT is 3 bytes instead of 1
<xentrac>right, it would always be ax with lodsw or stosw
<NieDzejkob>the compressor costs 35 bytes; if next were 1 byte, the savings of the compression would be very close to neutral
<xentrac>another trick I've sometimes found useful for squeezing down code is fallthrough (multiple entry points, kind of like you were doing with db 0x3c), which pretty much requires segregating the words' names from their code
<xentrac>so for example C! could be placed immediately before DROP, but instead of ending with NEXT, execution would just continue on into DROP, thus saving a NEXT and a duplicated POP BX
<xentrac>(or LODSW or whatever)
<siraben[m]>NieDzejkob: nice post!
<xentrac>NieDzejkob: my server died, I may have missed it if you said anything
<NieDzejkob>haven't said anything. Aren't there channel logs, though?
<siraben>NieDzejkob: there are public logs check /topic
***terpri_ is now known as robin
<oriansj>rekado_: please apply this patch https://paste.debian.net/1200900/ to https://git.savannah.gnu.org/cgit/guix/bootstrappable.git/ as I do not have access needed to update
<oriansj>civodul would you be able to help with this?
<stikonas>oriansj: hmm, I was not able to compile meslibc from mes-m2 included in stage0-posix...
<stikonas>strangely it worked when building mes that is currently used in live-bootstrap
<stikonas>argh, I'm probably on the wrong version of nyacc
<stikonas>mixed up 1.02.0 and 1.00.2
<oriansj>stikonas: mes-m2 now embeds a version of nyacc that works with the mescc included
<stikonas>oh I see...
<oriansj>mes-m2 is designed to be a one stop shop for getting a running copy of mescc
<oriansj>as I snapshot a version of mescc that works with mes-m2 and a version of myacc that works with both mes-m2 and the snapshot of mescc
<stikonas>well, I was incorrectly overriding GUILE_LOAD_PATH to point to incorrect nyacc
<stikonas>although, it worked till quite late, most files compiled, until qsort.c...
<stikonas>anyway, I'll try embedded copy
<oriansj>it should also include the mes libc files as well which will be needed for bootstrapping TCC (assuming I didn't miss anything)
<NieDzejkob>welp, forgot to preserve SI across my int13 routine
<stikonas>TCC is another thing I should convert from submodule to tar...
<stikonas>hmm, no, I'm still getting some strange error
<stikonas>Backtrace:
<stikonas> [c] (map f h . t)
<stikonas>unhandled exception: unbound-variable: (abort)
<stikonas>strange...
<civodul>oriansj: done!
<stikonas>something goes badly when building qsort.c
<oriansj>thank you civodul
<oriansj>stikonas: perhaps a mes libc change that I missed?
<stikonas>no idea, qsort.c files are identical... so I don't understand why it fails
<oriansj>stikonas: well what is the difference in the M1 output?
<oriansj>as M2-Planet+MesCC both output M1
<oriansj>so we need only look at the M1 files to see what changed but if the M1 files are the same, the DEFINE files might be different and be responsible for the change.
<stikonas>I don't think it outputs any M1 yet
<oriansj>stikonas: not even with -c ?
<stikonas>no, we build all files with -c
<stikonas>it just starts printing that unbound varible error
<stikonas>and I don't see any qsort.o or qsort.s
<stikonas>this is what I'm running https://github.com/stikonas/live-bootstrap/tree/mes-m2
<oriansj>getting the same error too stikonas when i do: ./scripts/mescc --no-auto-compile -e main ${bindir}/mescc.scm -- -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 lib/stdlib/qsort.c in mes-m2
<oriansj>janneke: I could use your help