IRC channel logs

2022-06-04.log

back to list of logs

<fossy>stikonas[m]: glad to see mes ppsyntax is that simple
<stikonas[m]>fossy: yeah, coreutils will be more involved
<stikonas[m]>mes ppsyntax is completely unused, so it is possible to just rm those files
<oriansj>stikonas[m]: do I need to do a minimal sed for mescc-tools-extra as well?
<markjenkinssksp>In the last few weeks, I've taken an interest in fixing the bootstrapple yacc licensing problem in a way that converges with my interest in bootstrapping Lox
<markjenkinssksp>Happy to share my early work-in-progress efforts in a "secret" gist and not a real project repo https://gist.github.com/markjenkins/4229efe7fe36365ea8d5fd392bea33b8
<markjenkinssksp>yeah, I know the channel is publicly logged (I catch up that way sometimes), hence "secret" in quotes. I don't want to put something half baked and experimental in a public project repo yet
<markjenkinssksp>sister repositories and gists. Not easilly bootstrappable, sublox1 reference implementation https://github.com/markjenkins/sublox-reference-implementation
<markjenkinssksp>Not easilly bootstrappable input pre-processor https://gist.github.com/markjenkins/1a39e62537de9f08648aaf0c82e7d689
<markjenkinssksp>my Lox / sublox implementation that I am targeting for early rebootstrappability https://github.com/markjenkins/lox_compiler_scheme
<muurkha>cool!
<markjenkinssksp>And no new code to be found here, but my expression of interest in taking the oriansj/mes-m2 slow_lisp/rewrite, with intent to strip out stuff I don't need like module systems and macros https://github.com/markjenkins/mes-m2-rewrite/
<markjenkinssksp>The parser generator side of things only computes nullable and FIRST sets so far, but I've been also getting a grasp on the theory behind generating LALA(1) automatons, so I've got a sense of what I need to do next with concepts like partitions and itemsets
<muurkha>LALR(1)?
<muurkha>or is LALA(1) a new sort of parsing automaton I haven't heard about yet?
<muurkha>(I hope I'm not being a dick about a typo, I'm just thinking there's a lot of new parsing stuff I don't know about yet)
<markjenkinssksp>LALR(1) indeed, just a typo
<muurkha>aha, cool
<markjenkinssksp>worth clarifying
<muurkha>the thing that most helped me understand LR parsing was writing a super stupid bottom-up parser
<markjenkinssksp>only re-inventing the implementation for the billionth time, but not the concepts, which are hard enough to grasp as is, see my REFERENCES file
<muurkha>which, instead of having a state table to figure out whether to shift or reduce, had a list of reduction rules, and it would loop over the list of reduction rules trying to apply each one to the top of the stack
<muurkha>if one succeeded, it would restart; if none succeeded, it would shift another token onto the stack
<markjenkinssksp>One reason for me to generate LALR(1) would be to get away with just re-using the yacc C skeletons that are already out there
<muurkha>yeah, I'm just saying if you want to understand LALR(1) intuitively, not as an alternative in practice
<muurkha>there are things LALR can do that this stupid approach can't, like if you have a sequence of tokens that should parse to, say, an initialization in one syntactic context and an assignment in another
<markjenkinssksp>interesting
<muurkha>but I found it really helpful in making the leap to understanding LR parsing at all
<muurkha>(I may not be a great guide to how to understand LR given that I didn't actually go on to write an LR parser generator, so take it with a grain of salt; maybe I don't understand it as well as I think I do)
<muurkha>the rules were basically just CFG productions running backwards: for X ::= Y Z W it would pop Y Z W off the stack (failing if they weren't there), package them up into an X, and push the X on the stack
<markjenkinssksp>At this point, I think I grasp how a state can represent that several possible rules/productions may be a match in progress
<muurkha>do you already know about the NFA/DFA equivalency? that was a big stepping stone for me on that question in particular
<markjenkinssksp>That's interesting, doing a web search found me https://neuraldump.net/2017/11/nfa-and-dfa-equivalence-theorem-proof-and-example/ and the summary at the start makes some sense to me
<markjenkinssksp>I took a third year automata course in my CS degree around 2004 or so, mandatory course for my honours degree, may have been exposed to that then, hard to say. May have got a C+ grade
<muurkha>oh, why don't you read https://swtch.com/~rsc/regexp/regexp1.html then
<muurkha>tastes may vary but that's the best explanation of the thing I've found
<unmatched-paren>oh, talking about LALR, i see. that's something i was wanting to understand, actually.
<muurkha>upon skimming the neuraldump page looks pretty much correct but it sort of lacks the context of this is important
<markjenkinssksp>Just added to my REFERENCES file, http://jsmachines.sourceforge.net/machines/lalr1.html has also been a big help, a nice interactive toy
<muurkha>I think LALR is kind of a distraction most of the time unless you're implementing yacc or debugging a yacc grammar
<muurkha>I mean intellectually LALR is a towering achievement of computer science
<muurkha>but practically using it is more trouble than it's worth most of the time
<markjenkinssksp>agreed, in this case I am trying to re-implement yacc earlier in the bootstrap path for the sake of bash parse.y, if it were not for the licensing problem I never would have gone down this rabit hole
<muurkha>I was going to say "unless you have yacc handy and no other parser generators" but actually I think implementing a Packrat or unmemoized PEG parser generator from scratch is a better choice in most contexts
<muurkha>I've worked on a Packrat parser in C but I haven't tried writing one from scratch, so that might be an exception
<muurkha>hey, this jsmachines thing is super great
<markjenkinssksp>yep, made sure to archive it
<markjenkinssksp>anyway, that's all for today, next progress report will probably be in a few weeks
<oriansj>keep up the great work markjenkinssksp
<stikonas[m]>oriansj: probably no need for sed, we build Gnu sed quite early
<stikonas[m]>There is a bit of need for mescc
<stikonas[m]>But in live bootstrap we just ship the whole mescc script with templated valued replaced manually
<stikonas[m]> https://github.com/fosslinux/live-bootstrap/blob/master/sysa/mes-0.24/files/mescc.scm
<stikonas[m]>Original file is http://git.savannah.gnu.org/cgit/mes.git/tree/scripts/mescc.scm.in
<stikonas[m]>oriansj: so I leave it up to you to decide
<stikonas[m]>It's a hack but not too big
<oriansj>doesn't look like it even needs sed but perhaps something rather simpler
<oriansj>replace -f input -o output -m pattern -r replacement would probably be good enough
<Hagfish>that does seem quite elegant, and follows the "principle of least power"
<oriansj>probably should allow output and input files to be one and the same so that it doesn't provide weird truncate behavior
<oriansj>guess it is finally time to add int fileno(FILE* f) to M2libc
<oriansj>or I guess I could just use fread
<stikonas[m]>Input and output doesn't have to be the same now that we have both cp and rm
<oriansj>well it just requires building an input buffer and that is just a couple lines
<oriansj>and I am just going to forbid null bytes in the input/output because we should only be working on human written text
<muurkha>you can also forbid super long lines by the same token
<oriansj>muurkha: well a strlen and malloc are cheap
<oriansj>and done
<oriansj>feel free to sanity test and yell at me if I didn't clear out all of the bugs yet
<oriansj>might need to add support for --help as well in a minute
<muurkha>oriansj: pretty cheap yeah
<oriansj>158 lines just to support a 5 line replace function
<muurkha>heh
<oriansj>but that replace function is freakin cheap and easy to reason about: https://paste.debian.net/1243050/
<oriansj>38 of them are just parsing the argv values
<oriansj>18 lines to do work; the rest as whitespace, comments and input validation
<oriansj>stikonas[m]: play with it and let me know if you think it is good enough for your needs