IRC channel logs

2026-06-11.log

back to list of logs

<roconnor>; escape :: Parser Char
<roconnor>; escape = p_right (char '\\') (alt (p_right (char 'n') (pure '\n')) (sat charesc))
<roconnor>Ah the tragedy of self-hosted compilers: The character literal '\n' means the literal character '\n'.
<Googulator>yeah, that's Ken Thompson's original example for why Trusting Trust attacks are possible
<Googulator>except his version was C
<matrix_bridge><Andrius Štikonas> To be honest, even if you program compiler in another language, you'll still use \n trick
<matrix_bridge><Andrius Štikonas> In stage0 I think we introduce it in M0 and it propagates from there
<aggi>what's the tragedy with this? an escape sequence which is unavoidable and must be handled with all I/O, and strict rules known for this
<roconnor>it's a tragedy because the semantics of the language, spefically the semantics of '\n' no longer exists in the source code itself.
<aggi>seems i can't follow, ASCII is the related standard, which defines the semantics
<aggi>or, you're saying the the escape-sequence parsing/handling of C-compilers was not implemented anymore and propagates binary-only?
<roconnor>yes exactly.
<aggi>thanks roconnor, wasn't aware of this
<roconnor>I mean it is implemented, as you can see above, but the implemention's semantics becomes self perpetuating.
<roconnor>as Googulator mentions Ken perhaps explains this better in STAGE II of https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf
<aggi>escape sequences are problematic for another reason, because there is many more than those in ASCII, various curses-implementations and terminals implementing those differently
<aggi>and it's a common technique to inject malicious code by their abuse
<aggi>for example, any irc-client linking againt ncurses
<aggi>and while ago too musl-libc reported a vulnerability which allowed for injecting code by mis-interpration of unicode escape sequences
<xentrac>usually IRC clients don't pass through escape sequences from IRC messages to the terminal emulator
<xentrac>but they do pass through Unicode characters
<lanodan>Yeah for escape sequences I'd more be wary of diffs, pagers, … than IRC clients
<stikonas>oh actually M2-Planet does implement full \n semantics: https://github.com/oriansj/M2-Planet/blob/761c2af5eee5bc2c27945b0ec896be26b8f5939b/cc_strings.c#L136
<stikonas>ir returns 10, not '\n'
<roconnor>If it makes you feel better there is still else if(c[1] == '\'') return 39;
<xentrac>stikonas: hurrah!
<xentrac>bravo for M2-Planet!
<stikonas>roconnor: well, the alternative is worse for 39...
<stikonas>I think I prefer if(c[1] == '\'') return 39 to if(c[1] == 39) return 39
<roconnor>Yeah.
<roconnor>And really, what base is "10" written in anyways :P
<stikonas>it's actually true not just for escape codes
<stikonas>even ASCII code of other characters is no longer in the source
<stikonas>well, at least not in the compilers source
<stikonas>e.g. when you write 'a' anywhere in the string
<roconnor>It depends. Sometimes characters are just passed through as whatever byte they are without consideration of its ASCII nature.
<roconnor>or like identifiers become snippets that match their own byte string where they are reference and it doesn't entirely matter what their character set is.
<stikonas>ok, M1-macro.c does have a function to convert chars into hex: https://github.com/oriansj/mescc-tools/blob/d59464d2641de8a90032ad2456bdb34b4db3436a/M1-macro.c#L396
<stikonas>but if you want you can rewrite it in away where conversion algorithm is lost in the source
<stikonas>and there is only a lookup table
<stikonas>if (ch == 'a') return 'a'; ...
<matrix_bridge><Jeremiah Orians> I strongly believe that full understanding of how everything works is essential. If anyone wants, I can cover the bios font table to byte lookup logic used in VGA screens
<matrix_bridge><Jeremiah Orians> There are no magic details. The only thing that is really hard coded is in hex2, hex1 and hex0
<matrix_bridge><Jeremiah Orians> The byte patterns for 0123456789ABCDEF
<xentrac>this is different: if (ch == 'a') return 97;
<xentrac>so is this: if (ch == 97) return 'a';
<matrix_bridge><Jeremiah Orians> https://github.com/oriansj/stage0-posix-amd64/blob/master/hex0_AMD64.hex0#L108
<xentrac>0A
<xentrac>23, 3B, 30
<matrix_bridge><Jeremiah Orians> This function only works if the ascii characters in the files actually line up with the hex values in the ASCII table
<xentrac>right
<xentrac>I wonder if it would be interesting to create a .hex file format augmented with machine-checked assertions about the machine code it contains
<xentrac>like, "in 72 08 the jump offset points to #:write"
<matrix_bridge><Jeremiah Orians> Well if you notice the M1 instructions in the comments
<xentrac>yeah, that's what inspired me to think about this
<matrix_bridge><Jeremiah Orians> It definitely wouldn’t take too much work to fully validate that the definitions line up with x86 documentation and that the hex lines up with the definitions
<xentrac>the difference from just writing in assembly and looking at a listing file is that (1) the assertions wouldn't have to fully determine the machine code; (2) the assertions could include things that are generally outside the scope of assembly; (3) if there was a mismatch you would get an error message instead of just different machine code; (4) a malicious verifier wouldn't be able to insert
<xentrac>malicious code into the output binary, so you could run several different verifiers, and you only lose if they're all backdoored in the same way
<xentrac>like how Metamath verifiers can be very simple and verify a proof witness produced by some arbitrarily sophisticated prover
<matrix_bridge><Jeremiah Orians> Sounds interesting to see
<xentrac>my knowledge of formal methods is pretty slender, though I did run L∃∀N for the first time this week. but even very dumb assertions could be useful maybe