IRC channel logs

2020-12-21.log

back to list of logs

***wowaname is now known as opal
<OriansJ>mihi: please check to see if my kaem test change resolved your issue in regards to locale causing the test failure.
<siraben>So for AoC day 19, I basically made a parser generator :D https://github.com/siraben/haoc-2020/blob/master/day19.hs
<siraben>Would be a nice exercise to make it blynn-compiler compatible
<pder>siraben: my crossly branch can now build crossly with M2-Planet. I only need to modify crossly.hs and precisely.hs to finish the bootstrap.
<pder>M2-Planet does not have long long support, so some of the 64bit primitive types will be missing. Do you think that will be an issue?
<siraben>No. I'll check precisely (which adds arbitrary precision integers) to see if it relies on 64biy
<siraben>64 bit*
<siraben>I have my own implementation of bignums using 32 bit cells; https://github.com/siraben/mini-haskell/blob/master/examples/bignum.hs
<siraben>This is compatible with the classy compiler, it's a fork of blynn-compiler scraped from his website before he made the repo public
<xentrac>siraben: nice!
<siraben>bignums would be nice to have but are not critical for bootstrapping, in fact we should base off of the last compiler before bignums
<siraben>xentrac: :D
<xentrac>siraben: wait, you made a parser generator that depends on Parsec, which is a parser generator
<siraben>xentrac: i'd say parsec is a parsing library, it could be implemented easily
<siraben>in fact, i'd probably take code from https://github.com/siraben/eopl/blob/master/src/LetrecParser.hs which is when I learned about monadic parsing and reimplemented it from scratch
<siraben>I think this is more or less compatible with marginally
<siraben>xentrac: oh, blynn-compiler has its own monadic parsing library as well, which I forgot
<xentrac>is monadic parsing where you use MonadPlus to get sequencing (from the monad operator, is that >>=?) and alternation (the plus part of monadplus?) I'm afraid I don't speak Haskell
<xentrac>basically using lazy lists to get backtracking parsing, with its exponential-time worst case? is there a way to stick cuts in there like you would in a Prolog DCG to get linear-time parsing in practice, even for erroneous inputs?
<siraben>xentrac: here's a great 8 page paper on this https://www.cs.nott.ac.uk/~pszgmh/pearl.pdf
<siraben>Yes! so naively it's really bad because LL(\infty) as in `a +++ b`, parser a can parse an arbitrary amount of tokens before b is run, where the backtracking would happen
<siraben>in the Parsec library, parser a would consume input and not backtrack before trying b
<siraben>also the naive implementation maintains all possible parses in a list
<xentrac>right, which is fine if it's a lazy list
<xentrac>or your grammar is unambiguous
<siraben>right
<xentrac>siraben: the introduction of this paper makes me think it was written before the Packrat dissertation
<siraben>xentrac: what's great about packrat parsing? i haven't looked into it
<xentrac>(although it's still true that Packrat parsers are commonly slower than bottom-up parsers)
<siraben>on reading the introduction of the paper it seems they emphasize the fact that these parsers are first-class values
<xentrac>oh, well, Packrat gives you a guaranteed-linear-time algorithm for a class of languages conjectured to include those generated by all unambiguous context-free grammars, plus some non-context-free languages, and its grammars are composable in a way that for instance LALR(1) grammars and LL(k) grammars aren't
<siraben>instead of some fixed set of combinators
<siraben>wow, that sounds nice
<xentrac>what I meant was that they say, "Of course, recursive descent parsers...lack the efficiency of bottom-up parsers"
<xentrac>it's pretty nice but not quite as nice as it sounds
<siraben>ah
<xentrac>catch #1 with Packrat parsing is that it's not just linear time: it's also linear space, while most parsing algorithms are constant-space or logarithmic space
<xentrac>catch #2 is that the constant factors can be quite large. like 1000 bytes of RAM per byte of input
<siraben>dang
<xentrac>catch #3 is that the version of Packrat described in Bryan Ford's dissertation can't handle left recursion; there are variants of the algorithm that can handle left recursion, but precisely in which cases it works is not well characterized (Warth et al.'s initial claim that it was fully general turned out to be wrong), and it may lose the linear-time guarantee (I'm not sure)
<xentrac>all that said, it's the standard pattern-matching facility in recent versions of Lua, replacing regular expressions, and it's been successfully applied to a wide variety of applications
<xentrac>also it's powerful enough that you can generally get by without a lexer, but that in itself typically entails extra cost
<xentrac>you'll probably be underwhelmed reading the Packrat dissertation in the sense that it's sort of just the (+++) and (<$>) operations plus an iteration construct and a few other things plus memoization
<xentrac>but it's a striking and perhaps unintuitive result that those give you the advantages described above :)
<siraben>xentrac: sounds interesting, if I have some more time and interest in parsing again I'll look into it
<siraben>then again I have yet to read up on LALR parsing and so on
<terpri>i recommend _parsing techniques: a practical guide_ by grune & jacobs for reference, if you *really* want to dig into parsing (partly for its 70-page annotated bibliography)
<fossy>:O
<xentrac>I'm really more interested in very simple general techniques like Packrat than more limited efficiency hacks like LALR
<OriansJ>I love this group. ambitious barely covers the goals here and yet we will achieve probably all of them and then some.
<siraben>terpri: thanks, sounds comprehensive!
<deesix>I'b been playing with this LL(1) grammar for C I found, making a recursive descent out of it. But with all the factorization and left recursion elimination the parser tree feels not easy to work with. The idea is coding it in the M2-Planet subset, make a nice AST for modular (per arch) code generation, but I'm starting to think it's not the best idea. Also, I think there will be problems with
<deesix>associativity.
<deesix>It's the grammar by Mohd Hanafiah Abdullah (cgram1-ll1 posted to comp.compilers). I think I got it from iecc.com.
<deesix>ftp://ftp.iecc.com:21/pub/file/cgram-ll1.gz it seems, but it's not there anymore (this was July or so).
<mihi>OriansJ, as of 2108ac4, kaem tests now fail for me regardless of locale :-P. Tested both on real x86 linux and on cygwin. In both cases, the C locale does not include the fancy quotes in the error message, causing the hash to not match.
<mihi>Also, I noticed in strace output that on x86 32-bit, the fclose syscall closed "random" file descriptors due to a missing "LOAD_INTEGER_ebx". And M2-Planet test #0104 does not properly null-terminate its envp array, resulting in an -EFAULT on execve in case there does not happen to conveniently be a 0 in memory anyway.
<mihi>Fixes are here https://github.com/schierlm/M2-Planet/commit/608fba306f46128fe901302f813d139c635e996a and included in the pull request I will send in a moment
<mihi>That being said, I/O buffering now works for me on x86 architecture, and all M2-Planet tests as well as the bootstrap still work, and hex2 being ~80% faster than before when built with M2-Planet.
<mihi>Pull request is here: https://github.com/oriansj/M2-Planet/pull/9
<mihi>On AMD64 I'm currently having trouble getting it to work, not sure if there is a M2-Planet bug or I'm too dumb to see that I use features not supported by M2-Planet. Even pointer comparisons fail strangely. Here is a short test case: https://github.com/schierlm/M2-Planet/commit/890b7cd25b39e2f165ed65a374103bec1a7b604b
<mihi>I noticed that M2-Planet does not like pointer arithmetic for pointers that point to anything not size 1, therefore I eliminated all pointer arithmetic from my changes, but probably there is another "feature" I'm using that is not supported by M2-Planet properly
<mihi>That being said, my WIP code for AMD64 (untested, or rather tested to be bad) is here https://github.com/schierlm/M2-Planet/commit/24918f2c205dfba35606f75fd0fe324f04fdb577 - but I'll happily fix and send another pull request if anyone tells me how to, or why said test fails on AMD64.
<deesix>Here is the LL grammar; I'd be glad to get any suggestions https://notabug.org/deesix/cgram-ll1/src/master/cgram-ll1
<mihi>oops and due to a mistake in my shell script, my test does not run on x86 - it assumes the elf header always being called test/common_${ARCH}/ELF-${ARCH}.hex2, but it is called common-x86/ELF-i386.hex2. But when fixing that script, the test runs on x86
<pder>mihi: regarding the pointer arithmetic in M2-Planet, the workaround I am using in blynn-compiler is to define a constant like so:
<pder>/CONSTANT CELL_SIZE sizeof(unsigned)
<pder>#define CELL_SIZE 1
<pder>If you have an unsigned *p and want to increment it do p = p + CELL_SIZE
<pder>/CONSTANT CELL_SIZE sizeof(unsigned)
<pder>There should be two forward slashes
<pder>M2-Planet ignores the double slash comment. This way your code should work in both M2-Planet and gcc