IRC channel logs
2026-06-11.log
back to list of logs
<roconnor>; escape = p_right (char '\\') (alt (p_right (char 'n') (pure '\n')) (sat charesc)) <roconnor>Ah the tragedy of self-hosted compilers: The character literal '\n' means the literal character '\n'. <Googulator>yeah, that's Ken Thompson's original example for why Trusting Trust attacks are possible <matrix_bridge><Andrius Štikonas> To be honest, even if you program compiler in another language, you'll still use \n trick <matrix_bridge><Andrius Štikonas> In stage0 I think we introduce it in M0 and it propagates from there <aggi>what's the tragedy with this? an escape sequence which is unavoidable and must be handled with all I/O, and strict rules known for this <roconnor>it's a tragedy because the semantics of the language, spefically the semantics of '\n' no longer exists in the source code itself. <aggi>seems i can't follow, ASCII is the related standard, which defines the semantics <aggi>or, you're saying the the escape-sequence parsing/handling of C-compilers was not implemented anymore and propagates binary-only? <aggi>thanks roconnor, wasn't aware of this <roconnor>I mean it is implemented, as you can see above, but the implemention's semantics becomes self perpetuating. <aggi>escape sequences are problematic for another reason, because there is many more than those in ASCII, various curses-implementations and terminals implementing those differently <aggi>and it's a common technique to inject malicious code by their abuse <aggi>for example, any irc-client linking againt ncurses <aggi>and while ago too musl-libc reported a vulnerability which allowed for injecting code by mis-interpration of unicode escape sequences <xentrac>usually IRC clients don't pass through escape sequences from IRC messages to the terminal emulator <xentrac>but they do pass through Unicode characters <lanodan>Yeah for escape sequences I'd more be wary of diffs, pagers, … than IRC clients <roconnor>If it makes you feel better there is still else if(c[1] == '\'') return 39; <stikonas>roconnor: well, the alternative is worse for 39... <stikonas>I think I prefer if(c[1] == '\'') return 39 to if(c[1] == 39) return 39 <roconnor>And really, what base is "10" written in anyways :P <stikonas>it's actually true not just for escape codes <stikonas>even ASCII code of other characters is no longer in the source <stikonas>well, at least not in the compilers source <stikonas>e.g. when you write 'a' anywhere in the string <roconnor>It depends. Sometimes characters are just passed through as whatever byte they are without consideration of its ASCII nature. <roconnor>or like identifiers become snippets that match their own byte string where they are reference and it doesn't entirely matter what their character set is. <stikonas>but if you want you can rewrite it in away where conversion algorithm is lost in the source <matrix_bridge><Jeremiah Orians> I strongly believe that full understanding of how everything works is essential. If anyone wants, I can cover the bios font table to byte lookup logic used in VGA screens <matrix_bridge><Jeremiah Orians> There are no magic details. The only thing that is really hard coded is in hex2, hex1 and hex0 <xentrac>this is different: if (ch == 'a') return 97; <xentrac>so is this: if (ch == 97) return 'a'; <matrix_bridge><Jeremiah Orians> This function only works if the ascii characters in the files actually line up with the hex values in the ASCII table <xentrac>I wonder if it would be interesting to create a .hex file format augmented with machine-checked assertions about the machine code it contains <xentrac>like, "in 72 08 the jump offset points to #:write" <matrix_bridge><Jeremiah Orians> Well if you notice the M1 instructions in the comments <xentrac>yeah, that's what inspired me to think about this <matrix_bridge><Jeremiah Orians> It definitely wouldn’t take too much work to fully validate that the definitions line up with x86 documentation and that the hex lines up with the definitions <xentrac>the difference from just writing in assembly and looking at a listing file is that (1) the assertions wouldn't have to fully determine the machine code; (2) the assertions could include things that are generally outside the scope of assembly; (3) if there was a mismatch you would get an error message instead of just different machine code; (4) a malicious verifier wouldn't be able to insert <xentrac>malicious code into the output binary, so you could run several different verifiers, and you only lose if they're all backdoored in the same way <xentrac>like how Metamath verifiers can be very simple and verify a proof witness produced by some arbitrarily sophisticated prover <xentrac>my knowledge of formal methods is pretty slender, though I did run L∃∀N for the first time this week. but even very dumb assertions could be useful maybe