IRC channel logs

2026-03-14.log

back to list of logs

<mwette>Ugh. cpp is trying to tokenize 8B and failing. (8B is not a numeric literal.) I need to look.
<mwette>The overflow may be something janneke mentioned. He has only one char buffer for unread-char, where nyacc needs two. For example, to know if "foo" is a function macro the reader has to see a "(" but there may be spaces before that. So, for example, looking at "foo =" the tokenizer will need to read the space and "=" and then unread "=" then space.
<mwette>Ugh. OK, I need to change the num-reader, so that it does not check for a legal number terminator.
<ekaitz>oh the buffer
<ekaitz>yeah we only have one
<mwette>Can you fix on your side? I only need two, I think.
<ekaitz>hm
<ekaitz>let me take a look
<ekaitz>mwette: yes, we have only one buffer called __ungetc_buf
<ekaitz>what I don't know how to fix this though
<ekaitz>we should make a file-descriptor <--> buffer association
<ekaitz>this is a little bit harder than it looks
<mwette>stikonas: try to comment out line 689 in nyacc/lex.scm: ;;((char-set-contains? c:ir ch) (bad-sfx (cons ch chl)))
<stikonas>ok, running the test now...
<stikonas>no, same error...
<mwette>Hmm. I was confident that would fix it. lemme try the file
<aggi>fyi: a dozen steps/ are confirmed with tcc-head for tiny-bootstrap up until bash-2.05b, with a few side-effects only
<aggi> https://codeberg.org/aggi/tiny-bootstrap/src/branch/master/steps-tiny
<aggi>if there was any change necessary for any individual live-bootstrap/steps/ for tcc-head then such a step is pulled into tiny-bootstrap/steps-tiny OVERLAY
<gabif>checking size of int... configure: error: cannot compute sizeof (int)
<gabif>See `config.log' for more details.
<gabif>Subprocess error 77
<gabif>ooops, didn't plan to send it here, sorry for the noise
<janneke>snuik: later tell ekaitz: i have a patch for an [fixed] arbitrary length unread buffer on `wip' branch that needs to be reviewed
<snuik>Will do.
<janneke>snuik: botsnack
<snuik>:)
<stikonas>oh so maybe wip branch of mes would work with nyacc 3.04.3...
<janneke>yes, that is to say, as i understand it, the problem is not so much mes+nyacc-3.04.3, but rather some terribly edge-case constructs that upstream tinycc is using
<janneke>stefan provided patches to avoid using such edge cases, much like i did some 9y ago for avoiding mis-use of the comma operator and such, and much like my requests 9y ago, stefans patches were rejected or reverted
<janneke>iow, yes, the wip branch has a hack to supply a deeper unread buffer, but i'm not happy with it and i haven't decided yet if it will go into 0.28, and i'm looking for perspectives on this
<janneke>and also, imho, we should, in our long term planning, have a way to remove tinycc from the full-source bootstrap as it's antagonistic to our cause and thus a liability
<janneke>*have a way => look for possible ways
<caffe>hello - has anyone been trying to bootstrap ghc recently?
<stikonas>caffe: I don't think recently
<stikonas>but there is some project now that might be useful longer term: https://github.com/augustss/MicroHs
<stikonas>but even microhs is not bootstrappable form C
<mwette>stikonas: arm64-asm.h looks odd: DEF_ASM_REGS(x) => ,TOK_ASM_x0 ... ; is there supposed to be something before the DEF_ASM_REGS(x) on line 69 ? nyacc will see output of cpp as top-level code starting with comma
<stikonas>I don't see arm64-asm.h?
<stikonas> https://gitlab.com/janneke/tinycc/-/tree/mes-0.27?ref_type=heads
<stikonas>or are you looking at upstream tcc?
<stikonas>hmm, even that one only seems to have .c not .h
<caffe>stikonas: thanks! I'm playing around with hugs and old versons of ghc to see how far I can get
<stikonas>well, a lot of people played with it
<stikonas>but I don't think anybody had any luck
<stikonas>and it's also a bit hard to run it all on modern systems
<stikonas>I wonder if hugs can bootstrap microhs
<caffe>stikonas: that's my impression so far as well - I recently got a claude subscription, so I'm seeing if it can sort-of-figure-it-out - it's a patched hugs and a patched ghc 3.02 that I'm working from (I think I've teased anxiety out of the model instead of general intelligence so far)
<stikonas>well, it also depends on the model...
<caffe>stikonas: if it's haskell-98, it probably can do microhs
<stikonas>in general, given unlimitted tokens, it might even be able to rewrite modern ghc in a bootstrappable language
<stikonas>but one would still have to review all the output...
<caffe>stikonas: i've tried doing that on C/C++ code-bases and it works pretty well there (mainly C++ -> C so that I don't depend on a C++ compiler)
<stikonas>well, given existing codebase, I guess LLMs have pretty good instructions on what to do
<caffe>Yeah, it doesn't even need to be creative there, it's just a smart transpiler
<stikonas>it doesn't have to convert imprecise spoken language to code but one code to another
<caffe>And it understands convoluted C++ templates way better than I do :D
<mwette>sorry (again); arm64-tok.h
<mwette>in your patch you referenced
<mwette>Is arm64-tok.h #included into a struct or array from *.c file?
<mwette>I see now,
<mwette>I need to test with the enum and string array declarations.
<mwette>another cpp edge case; I'm not sure how to deal with DEF_ASM_VEC_REGS(8B)
<mwette>The B8 does not seem to be the problem. I think more cpp bug on my end.
<mwette>oh, it is. Isolated works, but this fails: https://paste.debian.net/hidden/e87ff234
<matrix_bridge><wildwestrom> What in your opinion qualifies as a piece of machine code being audited? Do I have to go through, line by line, and make sure that every bit of hex corresponds to the instruction I think it does AND know what each register does through the lifetime of the program?
<matrix_bridge><wildwestrom> Is it enough to trust GAS outputting correct assembly? Can I trust a shell script that turns the hex into binary and converts the endianness?
<mwette>In the cpp.algo.pdf file I received; the glue routine says "paste last [token] of left with first [token] of right side"; and the description of the select() function says explicitly that the actual argument is a sequence of tokens.
<aggi>janneke> and also, imho, we should, in our long term planning, have a way to remove tinycc from the full-source bootstrap
<aggi>to clarify, it's the transition from M2 towards any fully capable C compiler which is critical
<aggi>the latter seeking "self-hosting", and in this regard GNU/g++/binutils are a _worse_ liability than tinycc
<aggi>for a long term perspective with any C compiler such should meet at least following criteria
<aggi>being bootstrappable from M2 (such as pnut), capable to drive a GNUish/POSIX type system with all required development utilites (such as tinycc)
<aggi>with far less lines of code involved than g++/binutils demand, and remain stable and standards compliant
<aggi>and, almost forgot, compile-time performance, pnut and tinycc are fast
<matrix_bridge><Andrius Štikonas> wilderstorm, well, here we start with "hex0", small commented machine code that self-hosts
<matrix_bridge><Andrius Štikonas> e.g. https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX/x86/hex0_x86.hex0
<mwette>I'm working on rework of cpp-subst for ##. I think I need to paste the entire argument token list and then retokenize.
<aggi>i'll continue with a busybox-1.2.2.1 port to tiny-bootstrap next, then linux-tcc 2.4 kernel
<aggi>with pnut/tcc it's <5minutes to arrive at bash-2.05b; busybox and kernel will add another minute or two, and then thats almost everything done for tiny-bootstrap to arrive at a bootable development host
<aggi>janneke: maybe this could convince tinycc-devel to coordinate for a pnut|mescc -> tcc-head re-integration
<aggi>given the fact, rickmasters while ago reported 13h total compile-time for entire live-bootstrap, if that could shrink to ~<10minutes total for a bootable *nix with development utilities
<matrix_bridge><wildwestrom> Andrius Štikonas: The reason I ask is because I have a hex0, but it was mostly AI generated. It seems legit upon skimming it and it obviously works because it can bootstrap forth and lisp, but that's not the same as an "audit".
<matrix_bridge> https://github.com/wildwestrom/stage0-riscv64-baremetal/blob/main/baremetal/hex0.hex0
<mwette>.
<matrix_bridge><Andrius Štikonas> yeah, with LLMs the boundary between source and various pregenerated files is a bit fuzzy...
<matrix_bridge><Andrius Štikonas> those stage0-posix-riscv64 .hex files are fully written manually, though in some places with the help of handheld calculator to speed up decimal->hex or decimal->binary conversions
<matrix_bridge><Andrius Štikonas> but they are much ahrder to write then e.g x86 hex0 files due to fairly complex binary encoding of riscv (where immediates are split into various bits that end up all over the place)
<ekaitz>oh i remember that yeah
<snuik>Welcome back ekaitz you have 1 message
<snuik>ekaitz, janneke says: i have a patch for an [fixed] arbitrary length unread buffer on `wip' branch that needs to be reviewed
<ekaitz>it was fun
<ekaitz>snuik: later tell janneke arbitrary length or many-buffer? I think we need the latter rather than the former
<snuik>Will do.
<stikonas>basically, for riscv64 hex work I was doing converting from M1 assembly to hex2 first, and then doing hex0. On the other hand, on x86/amd64 it's not hard to go from M1 directly to hex0
<ekaitz>snuik: later tell janneke I just read the patch: so it does both. Couldn't we just have it statically allocated? if it is a global I don't see why we shouldn't... I don't understand that `malloc`
<snuik>Okay.
<ekaitz>stikonas: if you want to try the commit: b87c850a1 you may be able to just cherry-pick it on mes, compile and run
<stikonas>is that mes commit?
<ekaitz>yes
<ekaitz>in wip branch
<ekaitz>maybe it's only in janneke's repo, but you have it also in mine i believe
<mwette>another approach to buffer issue: https://paste.debian.net/hidden/82bffd17
<mwette>^ untested, not proof-read
<ekaitz>mwette: that's like a small version of the commit i posted above, but also looks good
<mwette>It would probably be better to have a _buf[NFILEDES][2].
<mwette> I didn't look. I will
<mwette>if I can find it.
<mwette>I'm reworking the cpp-subst code in nyacc-c99.
<ekaitz> https://codeberg.org/ekaitz-zarraga/mes/commit/b87c850a113c057e61ce72ad73e1f2d3c78029df
<mwette>thanks; yours certainly more thorough; mine is a bit of a hack.
<mwette>stikonas: I cut a new nyacc release: V3.04.4. This one seems to work with the arm64-tok.h file.
<stikonas>thanks
<stikonas>and I still need to try that mes patch...
<mwette>I forgot to fix numeric test, I'll need to release 3.04.5
<mwette>but it's just in the test-suite; the deployed code should still work