<paroneayea>I would need a way to change the hardcoded path when moved to the store <paroneayea>I could have the build process change it on the fly <ArneBab_>my guile-emacs build is running now. I’ll see tomorrow whether it worked :) <bipt>at worst fixing it will involve building two binaries with different hardcoded paths (yuck) <paroneayea>I think it's better to put them in thier own .scm file <jmd>So I'm implementing a parser in Guile. The traditional way to make parsers is to have access and mutate some state variable. But this seems to be a mortal sin in the scheme ethos, where state variables are scorned upon and every function must return a modified value. However doing it that way I have found makes simple tasks excruciatingly painful - one has to return a alist, and inspect the alist for the terminating value - then <jmd>reparse it, for example. Does anyone have any tips for parser implementation in Scheme? <artyom-poptsov>jmd: Hi. Recently I faced with the problem of implementing a DSV parser for Guile. I saw two options here -- one is to use LALR parser module, other is to implement the parser by hand. <artyom-poptsov>I decided to implement it in recursive way, because it looks to me as the more straighforward way to solve this task <adhoc>artyom-poptsov: are you storing state in something like an AST? <artyom-poptsov>Although I should note that I think I'm not very experienced in solving such tasks. <artyom-poptsov>I just use a finite automata and implement it as a recursive procedure where the state and data is passed through recursive calls. <jmd>artyom-poptsov: I also decided on that approach. <artyom-poptsov>There is Guile-JSON too -- I think it worth looking (at least I used it as an example). <mark_weaver>jmd: mutating state is not considered a "mortal sin" in our community. we just try to minimize the use of mutation. <jmd>But like I said, the requirement to return the state and repass it is painfull. <jmd>Well how else does one call to the parser know the state of the previous? <mark_weaver>well, another option would be to nest all of the parser sub-procedures within some larger procedure, and the state variables would be kept within that larger procedure. <mark_weaver>that also has efficiency advantages, because lexical variables and procedures defined within a lexical scope are faster than top-level variables/procedures. <jmd>You mean like: (let* ((parser (parse1 parser)) (parser (parse2 parser)) (parser (parse3 parser)) ,,,, <mark_weaver>the reason is that top-level variables can be mutated at any time from anywhere, whereas internal variables can only be set! from within the lexical scope. <mark_weaver>so the compiler can see which variables are set! and which are never set!. the ones that are never set! can be inlined, etc. <mark_weaver>jmd: no, I mean like (define (parse str) (let ((state ...)) (define (parse-expr str) ...) (define (parse-term str) ...) ...)) <mark_weaver>now 'parse-expr' and 'parse-term' have access to 'state' without having to pass them in or out. <mark_weaver>and you can have more than one state variable of course <mark_weaver>however, I should mention that state machines are very naturally implemented in scheme without keeping the state in a variable at all. <mark_weaver>since procedure calls in tail position are just GOTOs with arguments, you can simply make one procedure per state, and then to make a transition to another state you simply call it in tail position. <mark_weaver>and this technique can be generalized to many non-finite state machines as well <mark_weaver>the key idea here is to not bother returning the state, but instead just keep calling the next procedure, relying on the fact that tail calls are GOTOs with arguments. <mark_weaver>also, the state needn't be a single variable, so there's no need to pack it up and break it apart again. you can pass the state as several arguments to each procedure. this results in very fast code as well, if all the procedures are internal procedures. <mark_weaver>jmd: can you show me the grammar that you wish to parse? <jmd>That's the other problem. It is not formally defined. I'm infering it as I go along. <mark_weaver>well, it occurs to me that it would be more helpful to show some example code to illustrate the techniques I've outlined above. <mark_weaver>but I would rather do that on a language that's close to what you're working on <jmd>Well try this simple fragment then: <jmd>One two %a<three four> five %b<six seven %c<eight nine>> ten. <mark_weaver>and tell me what you want the parser to return for that example? <jmd>Well when I start off writing a parser, I start with a program which returns success, if then input conforms to the specified grammar, false otherwise. The payload I add later. <jmd>So far as infomal description is concerned I suppose the above would be something like: <jmd>1. A marked region is a % followed by a < followed by a word list followed by a > <mark_weaver>what about the 'a', 'b', and 'c' between the % and the < in your example? <jmd>2. A word list is a word optionally followed another word. <jmd>3. A word is an alphabetic string or a marked region. <jmd>yes, there should be a letter following the % <mark_weaver>so a word list can only contain 1 or 2 words, no more or less? <jmd>no. It can contain any number <mark_weaver>okay, give me a few minutes and I'll cook something up <mark_weaver>example use: (with-input-from-string "One two %a<three four> five %b<six seven %c<eight nine>> ten" parse) <mark_weaver>there are many style one might use to write a parser, but this is one possibility <jmd>mark_weaver: Thanks. I'll digest that later. <ArneBab>the guile-emacs compile is still running ☺ <jmd>In the documentation for let: <jmd>"The procedure is called with the init expressions as the formal arguments." <jmd>What are the "init expressions" ? <jmd>The first argument is supposed to be a variable. You have a list. <jmd>I suppose a list is a variable. <jmd>The documentation doesn't explain what they are. <mark_weaver>(let loop ((v e) ...) ...) is just a shorthand for this: <mark_weaver>(let loop ((v1 e1) (v2 e2) ...) <body>) is just a shorthand for (let () (define (loop v1 v2 ...) <body>) (loop e1 e2 ...)) <mark_weaver>the latter form is more readable for newcomers, but the "named let" syntax is quite popular among schemers. <mark_weaver>jmd: the description of "named let" assumes that you already know about normal 'let', and helpfully includes a link to the documentation of normal 'let', where it is hopefully made clear what init expressions are. <mark_weaver>it starts with "For the definition of BINDINGS see the documentation about ‘let’ (see Local Bindings)" <mark_weaver>hmm, that texinfo code use a @ref variant that doesn't put in parentheses there. <mark_weaver>jmd: I added an annotation to that paste that's the same but avoids the "named let" <mark_weaver>and maybe by comparing the two it will help you understand "named let" <jmd>But the documentation uses @dfn{init} without explaining what init is. <mark_weaver>okay, about how changing that first sentence to "For the definition of BINDINGS, which includes the VARIABLES and INIT expressions, see ...". W <jmd>That would be better, yes. <jmd>Is there a starred vesion of named let ? <jmd>Well presumably it would apply the bindings in order. <mark_weaver>and what would happen when the recursive call is made? <ijp>mark_weaver: the last time I saw someone propose this (named-let foo ((bar baz) ..) zot ..) = (let* ((bar baz) ..) (let foo ((bar bar) ...) zot ...)) <ijp>I never did think to ask how multiple uses of the same binding would be handled <mark_weaver>although it could lead to surprising behavior if any of those earlier bindings were captured by a closure in the later init expressions and later mutated. <ijp>if it was an error (an actual one, not nasal demons) or if they were supposed to be merged in the recursive call <mark_weaver>jmd: anyway, the answer is "no", I've never heard of anyone implemented a named-let* <mark_weaver>although we do get asked the question occasionally here <mark_weaver>the thing is, (let ((v1 e1) (v2 e2) ...) <body>) is equivalent to (let () (define (dummy-proc v1 v2 ...) <body>) (dummy-proc e1 e2 ...)) <mark_weaver>and so 'named-let' is a trivial variant where that 'dummy-proc' is given a name and accessible within <body> <mark_weaver>but (let* ((v1 e1) (v2 e2) (v3 e3)) <body>) is equivalent to (let ((v1 e1)) (let ((v2 e2)) (let ((v3 e3)) <body>))), i.e. a set of nested single-variable let expressions. ***karswell` is now known as karswell
<mark_weaver>and if you notice, there's no procedure in there that can be 'named' as used the way the one in "named let" is used. <jmd>It might also be a good idea to change "variable" to something else. <jmd>Something more distinct from "variables". <mark_weaver>jmd: can you elaborate on the reason for that? these are actually variables. <mark_weaver>you mean the variable bound to the procedure, right. <jmd>call it "iteration-var" or something. <mark_weaver>thanks for the feedback. I agree that this documentation is hard to comprehend. <mark_weaver>it's harder for us to notice these things, because we are already so steeped in scheme :) <jmd>mark_weaver: Your parser example is kindof cheating. <jmd>Since it uses peek-char <jmd>which modifies its argument. <mark_weaver>actually, 'peek-char' doesn't mutate anything, at least not semantically. it is rather 'read-char' that mutates. <mark_weaver>it was only meant to show a very simple way of doing parsing in scheme <mark_weaver>the best way I know of to elegantly create parsers in a functional style is to use so-called "parser combinators", and it would be nice to have those in Guile, but we don't have them yet. <jmd>My problem is writing the equivalent of read-char for a general token source. <davexunit>I have the beginnings of a parser combinator library written <mark_weaver>ah, well, you didn't tell me that you needed that :-P ***fds_ is now known as fds
<davexunit>but there are things to implement that I haven't grokked yet to allow for left recursive parsers. <davexunit>my implemention will likely also be too slow, since I'm using SRFI-41 streams. <mark_weaver>jmd: anyway, the solution to that is easy: 'read-char' and 'peek-char' use (%current-input-port) when no port is explicitly specified. <mark_weaver>%current-input-port is a "parameter" (not to be confused with procedure "arguments") <mark_weaver>parameters are implemented as thread-local variables that can be temporarily set within a particular dynamic extent using the 'parameterize' special form. <mark_weaver>in the example usage I provided for this parser, that is done internally by 'with-input-from-string' <mark_weaver>jmd: what you can do is create your own port-like object that implements something like 'read-char' and 'peek-char', but provides tokens instead. <mark_weaver>however, if I had known you needed this I would have written by example parser in a different way. I was taking advantage of ports, which already exist, to keep it simple. <mark_weaver>jmd: out of curiosity, why do you need an arbitrary "token" source? do you feel strongly that lexical analysis should be in a distinct layer from parsing? <zacts>davexunit: are you going to move sly over to gitlab? <zacts>or should I use the gitorious one, or the github one? <zacts>so, which will be the canonical place for it? <davexunit>I use a self-signed ssl cert currently, though. <davexunit>so your browser will give you the scary page before you can get to it <jmd>mark_weaver: I think it makes it easier in this instance. <davexunit>I'm not interested in paying a bunch of money for a cert. <zacts>I wish I had enough hits on my dynamic DNS to host my own git server <jmd>Particularly when dealing with #include like things. <mark_weaver>ijp: do you know of any decent parser combinator libraries that we can use easily, maybe one written in portable R6RS? <mark_weaver>well, more explanation is needed, so maybe better to ignore this until I can provide some example code. <mark_weaver>right, if he wanted to use streams for the characters as well as the tokens. <jmd>davexunit: There are a few gratis certification authorities. <mark_weaver>the first very simple parser combinator approach I read about over a decade ago, for haskell, was to have a function for each non-terminal that accepts an input stream and returns a list of (result stream) pairs, one for each possible way of parsing the thing. <mark_weaver>if there an unambiguous way to parse it, it returns a single pair consisting of the internal representation for the thing parsed, and the stream of everything that follows. <mark_weaver>as I recall, the paper that talks about that is called something like "replacing failure with a list of successes" <mark_weaver>it's very simple and elegant, but not so efficient unfortunately <mark_weaver>and since then there has been a lot of work on making parser combinators more efficient <jmd>Having a distinct lexer stage also makes ignoring comments easier and a few other advantages too. <mark_weaver>but they are still very elegant to use, even if there is some complexity under the hood <mark_weaver>I think the key advantage in this context to having a separate lexer is that 'peek-char' is sufficient when you can decide what to do based on only looking ahead one character. <mark_weaver>if you replace that with 'peek-token' or equivalent, then you can look ahead one token, which is obviously more than one character in general. <mark_weaver>other than that, the only advantage I know of to separating the two is efficiency, because lexical analysis can typically be done with a simple finite state machine. <mark_weaver>davexunit: I've always just used self-signed certs, but I might use the EFF's "Let's Encrypt" service when its ready. <davexunit>mark_weaver: that's what I'm waiting for as well. <davexunit>but it means less visibility for my git repos <davexunit>mark_weaver: from what I've read about parser combinators, it seems that one needs to memoize heavily and use some special control flow with continuations. <davexunit>I also got tripped up when I realized I needed lazy evaluation to handle parsers that refer to themselves. <davexunit>my solution for that was to make the disjunction operator a special form, but that's likely insufficient. <davexunit>though I can't think of a counter-example to confirm it. maybe it's fine.