IRC channel logs

2020-08-18.log

back to list of logs

<OriansJ`>Hagfish: well it is certainly interesting to see someone try to bootstrap such ancient software. (and shows the pain of such an approach)
<Hagfish> https://xkcd.com/2347/
<Hagfish>it do be like that sometimes
<OriansJ`>Hagfish: more like all of the time; especially when you look at guix build graphs
<OriansJ`>although I am tempted to use a ripoff of that image as the stage0 logo but with labels of various programs in the stack
<OriansJ`>probably with this dpendency chain at the bottom: https://github.com/oriansj/talk-notes/blob/master/Current%20bootstrap%20map.pdf
<OriansJ`>with the one super shaky piece at the bottom, with a slightly larger piece on top and so on
<OriansJ`>with a single arrow, going "we are here, don't fuck it up"
***luizhenrique is now known as LHLaurini
<bauen1>the website is down
<bauen1>not sure who to ping \o/
<bauen1>OriansJ`: but either it's you or you know who to ping
<janneke>bauen1: thanks
<rain1>seeing that bootstrap map is very exciting. So it is just this part in the middle
<bauen1>by the way has anyone tried to implement a backdoor or rather a bug in e.g. stage0 that can propagate all the way into e.g. mes or tinycc or gcc ?
<rain1>i don't think so but imagine such a backdoor: It would mean evil-stage0 is 100x bigger than normal stage0
<rain1>at least that is what i'd expect
<bauen1>rain1: true, i was also thinking about maybe abusing unicode once again in the stage1 source code to trick the casual reader into compiling malicious code
<bauen1>the backdoor itself probably wouldn't be very huge, it would probably amount to modifying the write_byte method to check for a specific sequence and if it is found write it's own code instead
<bauen1>perhaps 32 - 128 bytes
<bauen1>writing a backdoor that can also propagate to the other compilers is what would probably blow the size
<bauen1>actually, hex0, hex1, hex2, m0, cc_amd64 all use very similiar routines, so maybe not impossible
<bauen1>OriansJ`: i've noticed that a 64-bit mov `48C7C7 00000000 ; LOADI32_RDI %0 # All is well` is used in hex0 and hex1, but everything later just uses a sign-extending move `BF 00000000` ; perhaps a few bytes could be saved ?
<bauen1>this is for amd64
<bauen1>wait, BF is not a sign-extening move but just a simple move
<bauen1>point still stands
***ChanServ sets mode: +o rekado_
<bauen1>also, am i right with the assumption that `fputc` from M0 or cc_amd64 is passed down to M2 ?
<OriansJ`>bauen1: rekado is the one responsible for the website if I remember correctly
<OriansJ`>rain1: yeah the middle part is the absolute worst
<OriansJ`>bauen1: I've tried to think of ways which stage0 could be backdoor'd and then intentionally add things to make it harder.
<bauen1>lol
<OriansJ`>for example: https://github.com/oriansj/stage0/blob/master/High_level_prototypes/sin.c
<OriansJ`>will instantly flag any and all unicode characters
<bauen1>my current plan would be to hook the write and exit syscall, then do some byte counting in the write syscall to modify the ELF header to add additional space that is loaded into memory, then a bit of "pattern matching" to hook the syscalls, and finally at the exit hook write the additional backdoor code
<OriansJ`>this will find all deviations in hex0 binaries: https://github.com/oriansj/stage0/blob/master/stage1/dehex.hex0
<bauen1>having all the code hand coded makes things a bit annoying since adding or removing bytes will break things
<OriansJ`>and stage0 itself can run on bare metal, so no kernel to hook either
<OriansJ`>fputc and fgetc are actually implemented in M0 https://github.com/oriansj/stage0/blob/master/stage3/M2-Planet_x86.c#L43 and just simply wrapped in inline assembly for M2-Planet
<bauen1>i'm mostly looking at the amd64 implementation, but that is helpful
<OriansJ`>THe only places you could hide the ELF tampering quietly is in M2-Planet but cc_x86 can build M2-Planet and should produce extremely similiar output (only using unsigned instead of signed integer instructions being the only difference)
<OriansJ`>So one would have to either break the ability of cc_x86 to build M2-Planet (major issue for all of us) or compromise both cc_x86 and M2-Planet at the exact same time and hide the exact same backdoor in both C code and hand written (easy to audit) assembly
<bauen1>OriansJ`: i'm fairly aware that you can discover the backdoor easily if you have a trusted compiler of some form
<bauen1>OriansJ`: so i'm mostly trying to attack the inital binary hex0-seed
<OriansJ`>bauen1: yep David A. Wheeler's DDC work
<OriansJ`>250bytes doesn't leave much room to attack
<bauen1>OriansJ`: as far as i understand the code, most (if not all) compilers up to mes use very similiar method of outputting the binary (a write syscall with a size of 1) and similiar exit methods (syscall exit without closing the fd)
<OriansJ`>also the initial seed is expected to be visually inspected and hand toggled in; so much harder to hide things at that level
<bauen1>true
<bauen1>mostly a proof of concept for fun
<OriansJ`>well there are certainly behaviors that a kernel could flag on
<bauen1>depending on how similiar all compilers are it could be possible to implement a backdoor in hex0-seed with very a very small size (less than 1kb) that can still propagate all the way to M2-Planet (and whatever is after that)
<OriansJ`>but you run into a risk of discovery as binary sizes and checksums are known at all times
<bauen1>yeah, a backdoor in such a small binary is very obious
<bauen1>*obvious
<OriansJ`>and it has to avoid detection across multible independent platforms and operating systems
<OriansJ`>as all architectures can bootstrap and cross-check every step of every binary of all other architectures
<OriansJ`>*afk*
<bauen1>it already has to implement byte pattern checking against 1 architecture, probably wouldn't be very complicated to add more
<bauen1>but yes, introducing such a backdoor into a relatively trusted system would be extremely hard
<bauen1>it does appear that the move instructions encoded as B0 - BF can be used to replace some multibyte opcodes in hex0 to reduce the size
<bauen1>but not for r8-r15
***terpri__ is now known as terpri
<OriansJ`>bauen1: well mescc-tools and M2-Planet are platform neutral (eg produce identical output regardless of host); thus all architectures supported by them can cross-check each other and all bootstrap steps along the way as well.
<OriansJ`>however I wish for stage0 binaries to be completely untrusted. eg everyone should make their own hex0 that they trust and use it to build the hex0.hex0 for their architecture; thus no single point of compromise could be possible as everything for all architectures become pure source.
<OriansJ`>as for opcode encoding in M2-Planet and stage0; I intensionally selected inefficient encodings to encourage other people to contribute improvements; not to mention all the spelling and grammar errors.
<OriansJ`>that way there are easy paths for people to choose to contribute to the various pieces and "hopefully" spot anything that I have done that isn't completely obvious or potentially malicious.
<bauen1>nice
<bauen1>just think of what i'm trying to do as a proof of concept of why even a 1-2kb binary seed should never be trusted
<OriansJ`>well if one was truly clever, it would be possible to do hex0 in 64bytes
<bauen1>true
<OriansJ`>but that would be harder to reason about and understand how it works; thus I have opted for something absolutely stupidly simple to follow
<bauen1>i could hack it down to maybe 128 bytes and make the backdoor non-obvious, but then the backdoored stage0 won't produce a matching binary when given an unmodified stage0 hex0 code
<OriansJ`>the benefits of thinking like an attacker while designing defenses
<OriansJ`>forced into 7bit ascii only input and state machines that cover every possible input in consistent ways
<bauen1>yeah, it could be a lot simpler (just deal with [0-9A-F])
<OriansJ`>no support for escapes of any kind until cc_x86
<OriansJ`>only single use, non-recursive line macros in M1/M0
<OriansJ`>Line comments and not block comments to prevent false comment attacks
<OriansJ`>all non-hex characters are dropped (I could even add error catching in hex2 and M1 to flag them as outside of a comment field)
<OriansJ`>which makes only 0-9, a-f, A-F, space, tab and newline characters the only valid inputs (with ; and # line comments allowing everything up to \n)
<OriansJ`>we already catch duplicate labels in hex2
<OriansJ`>and duplicate DEFINE definitions in M1
<OriansJ`>The more potential attacks of hex0/hex1/hex2, M0/M1 and cc_x86/M2-Planet we know about; the better we can defend in the long term.
<OriansJ`>Plus I want people to instantly catch any "bugdoors" I might introduce along the way
<OriansJ`>Thus salting the ground against would be attackers
<bauen1>i image type checking (and const verification to some extend) only comes at M2-Planet (if at all) ?
<bauen1>not 100% sure how many "features" m2-planet has