IRC channel logs

<roconnor>; escape :: Parser Char

<roconnor>; escape = p_right (char '\\') (alt (p_right (char 'n') (pure '\n')) (sat charesc))

<roconnor>Ah the tragedy of self-hosted compilers: The character literal '\n' means the literal character '\n'.

<Googulator>yeah, that's Ken Thompson's original example for why Trusting Trust attacks are possible

<Googulator>except his version was C

<matrix_bridge><Andrius Štikonas> To be honest, even if you program compiler in another language, you'll still use \n trick

<matrix_bridge><Andrius Štikonas> In stage0 I think we introduce it in M0 and it propagates from there

<aggi>what's the tragedy with this? an escape sequence which is unavoidable and must be handled with all I/O, and strict rules known for this

<roconnor>it's a tragedy because the semantics of the language, spefically the semantics of '\n' no longer exists in the source code itself.

<aggi>seems i can't follow, ASCII is the related standard, which defines the semantics

<aggi>or, you're saying the the escape-sequence parsing/handling of C-compilers was not implemented anymore and propagates binary-only?

<roconnor>yes exactly.

<aggi>thanks roconnor, wasn't aware of this

<roconnor>I mean it is implemented, as you can see above, but the implemention's semantics becomes self perpetuating.

<roconnor>as Googulator mentions Ken perhaps explains this better in STAGE II of https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf

<aggi>escape sequences are problematic for another reason, because there is many more than those in ASCII, various curses-implementations and terminals implementing those differently

<aggi>and it's a common technique to inject malicious code by their abuse

<aggi>for example, any irc-client linking againt ncurses

<aggi>and while ago too musl-libc reported a vulnerability which allowed for injecting code by mis-interpration of unicode escape sequences

<xentrac>usually IRC clients don't pass through escape sequences from IRC messages to the terminal emulator

<xentrac>but they do pass through Unicode characters

<lanodan>Yeah for escape sequences I'd more be wary of diffs, pagers, … than IRC clients

<stikonas>oh actually M2-Planet does implement full \n semantics: https://github.com/oriansj/M2-Planet/blob/761c2af5eee5bc2c27945b0ec896be26b8f5939b/cc_strings.c#L136

<stikonas>ir returns 10, not '\n'

<roconnor>If it makes you feel better there is still else if(c[1] == '\'') return 39;

<xentrac>stikonas: hurrah!

<xentrac>bravo for M2-Planet!

<stikonas>roconnor: well, the alternative is worse for 39...

<stikonas>I think I prefer if(c[1] == '\'') return 39 to if(c[1] == 39) return 39

<roconnor>Yeah.

<roconnor>And really, what base is "10" written in anyways :P

<stikonas>it's actually true not just for escape codes

<stikonas>even ASCII code of other characters is no longer in the source

<stikonas>well, at least not in the compilers source

<stikonas>e.g. when you write 'a' anywhere in the string

<roconnor>It depends. Sometimes characters are just passed through as whatever byte they are without consideration of its ASCII nature.

<roconnor>or like identifiers become snippets that match their own byte string where they are reference and it doesn't entirely matter what their character set is.

<stikonas>ok, M1-macro.c does have a function to convert chars into hex: https://github.com/oriansj/mescc-tools/blob/d59464d2641de8a90032ad2456bdb34b4db3436a/M1-macro.c#L396

<stikonas>but if you want you can rewrite it in away where conversion algorithm is lost in the source

<stikonas>and there is only a lookup table

<stikonas>if (ch == 'a') return 'a'; ...

<matrix_bridge><Jeremiah Orians> I strongly believe that full understanding of how everything works is essential. If anyone wants, I can cover the bios font table to byte lookup logic used in VGA screens

<matrix_bridge><Jeremiah Orians> There are no magic details. The only thing that is really hard coded is in hex2, hex1 and hex0

<matrix_bridge><Jeremiah Orians> The byte patterns for 0123456789ABCDEF

<xentrac>this is different: if (ch == 'a') return 97;

<xentrac>so is this: if (ch == 97) return 'a';

<matrix_bridge><Jeremiah Orians> https://github.com/oriansj/stage0-posix-amd64/blob/master/hex0_AMD64.hex0#L108

<xentrac>0A

<xentrac>23, 3B, 30

<matrix_bridge><Jeremiah Orians> This function only works if the ascii characters in the files actually line up with the hex values in the ASCII table

<xentrac>right

<xentrac>I wonder if it would be interesting to create a .hex file format augmented with machine-checked assertions about the machine code it contains

<xentrac>like, "in 72 08 the jump offset points to #:write"

<matrix_bridge><Jeremiah Orians> Well if you notice the M1 instructions in the comments

<xentrac>yeah, that's what inspired me to think about this

<matrix_bridge><Jeremiah Orians> It definitely wouldn’t take too much work to fully validate that the definitions line up with x86 documentation and that the hex lines up with the definitions

<xentrac>the difference from just writing in assembly and looking at a listing file is that (1) the assertions wouldn't have to fully determine the machine code; (2) the assertions could include things that are generally outside the scope of assembly; (3) if there was a mismatch you would get an error message instead of just different machine code; (4) a malicious verifier wouldn't be able to insert

<xentrac>malicious code into the output binary, so you could run several different verifiers, and you only lose if they're all backdoored in the same way

<xentrac>like how Metamath verifiers can be very simple and verify a proof witness produced by some arbitrarily sophisticated prover

<matrix_bridge><Jeremiah Orians> Sounds interesting to see

<xentrac>my knowledge of formal methods is pretty slender, though I did run L∃∀N for the first time this week. but even very dumb assertions could be useful maybe

IRC channel logs

2026-06-11.log