IRC channel logs

2020-08-19.log

back to list of logs

***nckx is now known as PotentialUser-93
***PotentialUser-93 is now known as nckx
<OriansJ`>bauen1: you are correct in that it isn't until cc_x86 and M2-Planet are there types (int, char, pointers, array, struct and union) [basically everything one would need to write a Compiler or interpreter) with M2-Planet supporting const (and promptly discarding it) and different operations for signed/unsigned integers
<OriansJ`>Other features of cc_x86 and M2-Planet are: functions, globals, locals, conditionals (if/else), iterators (do/while/for), flow control (continue/break/goto/labels), array reference (a[x] = 1; i = b[3];), struct reference (a->next->next = NULL; i = a->value;), custom types with sizeof support and inline assembly
<OriansJ`>if you wish for an example of a program that uses a good set of C features used: https://github.com/oriansj/mes-m2
<OriansJ`>and because C's Macro system is stupid for auditability/clarity I made # line comments in M2-Planet/cc_x86 and // as ignored tokens; so one can do //CONSTANT foo 4 and #define foo 4 and get the same behavior or //CONSTANT cell_size sizeof(struct cell) and #define cell_size 1 to compensate for the fact 1 + pointer in M2-Planet/cc_x86 is the next byte and in C could mean 700+ bytes later depending upon the pointer type
<OriansJ`>Other than that; M2-Planet/cc_x86 are a pure C core subset capable of anything you would need
<OriansJ`>The next step is getting MesCC to work on guile (to simplify the task of mes-m2 incrementially becoming compatible)
***V is now known as Guest7215
***V_ is now known as V
<bauen1>so my backdoor poc for hex0-seed can now check for outputs that are elf headers like itself and then modifies the p_filesz/p_memsz to make room for itself and hijacks the entry vector
<bauen1>it's now also ~1kb big, but the code is still very straight forward and unoptimised
*janneke can imagine the news headlines
<bauen1>i could change the way the write and exit syscalls are invoked across all compilers to be simpler and more uniform to make my life easier
<bauen1>janneke: you can still find the backdoor very easily
<janneke>bauen1: yeah, but who reads the rest of the article nowadays?
<janneke>;)
<bauen1>lol
<janneke>most secure so-called "full source bootstrap" compromised
<OriansJ`>bauen1: except for one small problem, not all hex0 are ELF or COM but some are just pure binary. no entry vector, just execute the first byte of the file
<bauen1>true
<bauen1>but that would be even easier to plant a backdoor in
<bauen1>at the same time also easier to detect
<bauen1>just put the bootstrap code up front and the payload at the end (figuring out where the payload is could be a bit complicated)
<OriansJ`>bauen1: except the first instruction would have to be a jump or call and thus the payload would be the very first thing inspected
<OriansJ`>as hex0 does not buffer input or output; so you would have to write your setup before the main program could write a byte or have to shift the entire contents of the file
<OriansJ`>after the fact
<bauen1>i just write my own start code (bootstrap0 a few bytes ) that jumps to the bigger payload (aka. bootstrap1) that just copies the original start code where it should be and jumps to it as if nothing happened
<bauen1>it works quite well for amd64 elf
<OriansJ`>and you also have no idea about output size until after you read all of the input
<bauen1>but all of the backdoor is really dependent on the ABI and what tools are exposed by assembly
<bauen1>which is why i load the memsz from the elf header in my backdoor to figure it out
<bauen1>but you could also do a pattern search and hope for the best
<OriansJ`>but raw binaries could be anything, including indirect hex binaries
<bauen1>oh i see what you're getting at
<bauen1>yes that would be a very big issue
<OriansJ`>eg input could be hex with the output being the source code we know is good
<OriansJ`>really easy to catch additions or subtractions or modifications
<OriansJ`>also we could do a self-checking sha256sum program; which just outputs its own memory's sha256sum.
<bauen1>but everything that has a common binary header (or start e.g. push rax push rbx push rdx) can be easily detected
<bauen1>yeah that would catch this (assuming you know the correct output)
<OriansJ`>bauen1: only if one uses the turbo pascal standard for function calls; M2-Planet doesn't
<OriansJ`>push can be just a mov rax [rbx]; add rbx, 8
<OriansJ`>or sub rbx, 8 if you want the stack to grow in the proper direction (up)
<OriansJ`>as there is pass via register, pass via stack or pass via both (common in optimizing compilers)
<bauen1>OriansJ`: but the platform that runs the binaries (in almost all cases) has some form of ABI defined
<OriansJ`>bauen1: only for system calls such as read and write
<bauen1>OriansJ`: for linux-x86_64 it would be the elf header or that certain registeres can be / are clobbered by the kernel
<OriansJ`>everything else we can do arbitrary shit with
<bauen1>also the backdoor could buffer the output in memory until it has enough information (currently done to calculate the correct p_filesz / p_memsz)
<OriansJ`>handwritten assembly uses pass via registers with the callee pushing to save registers from modification (as it is simpler to reason about) but pass via the stack is easier for a state machine (like M2-Planet)
<OriansJ`>bauen1: except elf files have to specify memory required for allocation from the kernel (which is usually rounded up to the next page) and writing past that memory address without a sys_brk to allocate more is a page fault
<OriansJ`>if the program is never supposed to call sys_brk; we have something to flag on in kernel space
<bauen1>OriansJ`: true, but the entire ELF header is loaded into memory and i never really need more than i can stuff there
<bauen1>which would be catched by the sha256sum program
<OriansJ`>So we pass 100MB of input (say a valid ELF program) but no malloc can occur in hex0 and there is no way hex0 will allocate 100MB on load
<bauen1>it is detectable
<OriansJ`>So either the payload will page fault hard (easy to detect) or it must do a sys_brk, which is also something easy to detect
<bauen1>but i don't necessariyl need to hold the entire output to gather the information i need to modify the output before it is written
<OriansJ`>bauen1: true valid ELF files do make that easy
<OriansJ`>raw binaries are another story though
<bauen1>so of course it is detectable, but not to the casual user who runs hex0-seed without double checking the size
<bauen1>(or verifying the assembly)
<OriansJ`>absolutely, there is no protect against users who don't properly check
<bauen1>making a completely undetectable backdoor is impossible, making one that is hard to detect (actively obfuscate the input and check the output) is hard, making a simple one that can adapt to "normal" source code is easy
<OriansJ`>now guix's build merkle tree could spot an alternate root binary rather easily
<OriansJ`>yep, our goal isn't to make backdoors impossible but rather easy to catch and publicly shame
<OriansJ`>blood-elf would probably be the optimal place to put a backdoor as it is used to generate the dwarf stubs for M2-Planet outputs
<OriansJ`>However its output is 100% M1-macro architecture neutral data segments (with a small requirement of --64 for 64bit architectures)
<OriansJ`>So you would have to hide it in the M1 output and only output when processing a C compiler or blood-elf
<bauen1>so i now have an unoptimised backdoor for hex0-seed that will produce itself when given the hex0-seed hex0 code
<bauen1>only 1264 bytes
<bauen1>*1230 bytes
<bauen1>it requires a small modification to hex0-seed to make finding the syscalls easier
<xentrac>wow!
***nckx is now known as facebook
***facebook is now known as nckx
***Server sets mode: +cnt
<Hagfish>impressive
<Hagfish>will it just propagate itself, or does it have like a payload of some sort?
<bauen1>Hagfish: it propagates itself
<bauen1>Hagfish: it hooks into certain patterns that are write and exit syscalls
<bauen1>Hagfish: in the write syscall it checks if an ELF header is written, if it finds one it modifies it slightly to add space in the final binarie, then overwrites the entry and writes calls where write and exit syscalls would be
<bauen1>technically it can also propagate itself up to mes but i might need to adjust the syscalls to write / exit to match `mov rax, <syscall_num> \n syscall`
<bauen1>perhaphs not yet to binaries generated by mescc
<bauen1>yeah, since mescc seems to differentiate between text and code it won't work "out of the box"
<bauen1>perhaps also changing the elf header flags to map .text as rw could work
<OriansJ`>.text is already rwx
<OriansJ`> https://github.com/oriansj/mescc-tools-seed/blob/master/x86/hex0_x86.hex0#L63
<OriansJ`>as one would manually need to roll seperate segments but you would need padding to align to page sizes to actually get the segments with different permission flags
<OriansJ`>as hex2 doesn't support after this; starts at address xyz