IRC channel logs
2014-01-09.log
back to list of logs
<ijp>sneek: later tell wingo if by some chance you haven't already, check out "secrets of the glasgow haskell compiler inliner" <mark_weaver>ijp: thanks for pushing the peval fix. feel free to push your ,br fix also. <mark_weaver>although actually, I'm about to push some things, so maybe wait a few minutes :) <ijp>hmm, I thought I had, but it looks like I didn't <mark_weaver>ijp: okay, I pushed my patches. feel free to push your ,br fix. <ijp>wow, someone is trying to get geiser to work with scsh <mark_weaver>The 'r7rs-wip' branch now has a fairly complete implementation of R7RS for stable-2.0, which passes the R7RS test suite from chibi scheme. Apart from lack of docs and tests, the main remaining issue is that the modified 'write' which supports SRFI-38 is much slower. <mark_weaver>heh, well, printing (iota 1000000) is about 10 times slower on my machine. most of that time is spent in GC. <mark_weaver>anyway, I'd rather wait until it's really ready. also, I'm deleting and recreating this branch periodically, so beware. <mark_weaver>I think I'll end up writing a miniature hash table implementation that does no allocation in the common case. <taylanub>Hrm, if I import binding `foo', then (define bar (let ((priv-foo foo)) (lambda () (use priv-foo)))), then my `priv-foo' won't see changes in the top-level `foo', and will be better optimizable, right ? <taylanub>If so, this is essentially "static linking"! (Of a single binding.) <taylanub>But no, a module's body surely isn't executed at compile-time .. does that `let' take effect the first time the module is loaded, perhaps ? <ArneBab>Is there a way to define a variable as uint32, so bitshift works as in C? <ArneBab>I implemented the tiny encryption scheme in guile scheme, but my implementation will be horribly slow, because the original algorithm uses properties of bitshift on uint32. <ArneBab>essentially what I need to do is to run (modulo number 2**32) every time I do anything which could increase the number. <taylanub>The fixnum API would be the right thing here, IIRC Guile has no fixnums yet ? <taylanub>Oh, we have it in (rnrs arithmetic fixnums (6)), but no idea if it uses optimized stuff. <taylanub>And so I just realized that procedures could generally be optimized at creation-time (run-time) to specialize for the types of objects which were unknown at compile-time .. <taylanub>Not sure if that falls under JIT already, I'm thinking of the simple case where given (define (make-proc foo) (lambda () (use foo))), (make-proc <compile-time-constant>) can be optimized to return a procedure that's specialized on the type of foo, omitting type-checks and all, whereas (make-proc variable) will generally return an unspecialized procedure. <taylanub>A procedure is generally called more often than it's created, so I wonder if it would be worthwhile to always optimize at procedure-creation (no heuristics and stuff like it's usually in JIT engines AFAIK). <taylanub>Well leaving aside vague terms like JIT, it certainly involves run-time code generation/modification. <taylanub>So I think it would be best after all to have a specialized "static import" that imports an immutable copy of a binding from the current bindings of a module, instead of implementing this complex run-time optimization idea. <ArneBab>taylanub, civodul: sorry for answering late <ArneBab>taylanub, civodul: how do I create a fixnum? *ArneBab is reading the docs right now <ArneBab>the fixnums seem to be 64bit … (fixnum-width) → 61 <ArneBab>mark_weaver: what I search for is a way to tell scheme to treat a given number as uint32 (because an algorithm requires that) <ArneBab>calling modulo all the time eats cycles :) <ArneBab>ok, so I’m currently limited to my workaround <mark_weaver>'remainder' might be faster, and if the arguments is always non-negative then it's the same as modulo. <ArneBab>the cost I currently pay is that I have to call (modulo number (integer-expt 2 32)) on every math operation <mark_weaver>if you're doing exponentation, then see 'modulo-expt', which is *much* faster than exponentiating and then doing modulo. <mark_weaver>well, I hope you're precomputing the (integer-expt 2 32), right? *ArneBab hides in very dark shadows… (re precomputing) <mark_weaver>also, depending on the operations you're doing, you probably don't *need* to do the modulo after every operation. just often enough to make sure that the intermediate numbers don't get too big. <mark_weaver>well, I'll tell you right now that precomputing 2^32 will be a *lot* faster. <ArneBab>(likely the biggest performance leak right now - I did not actually profile the function: I wanted to see whether I can make the code more “natural” to the algorithm <mark_weaver>anyway, if you need speed for something like uint32 math, then you should use C for that, as much as I try to avoid C these days. <ArneBab>I just want to implement a C-based algorithm as natually as possible <mark_weaver>rather than precomputing, perhaps you should just put in the numeric constant itself, written as #x100000000 <mark_weaver>You can compute the entire subexpression ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1) before taking the modulo. <mark_weaver>it will always be the same, and much faster, than taking the modulo after every suboperation. <mark_weaver>ditto for the other subexpressions that look like that. <ArneBab>hm, yes, because the xor of the upper bits is irrelevant <mark_weaver>(logand x #xFFFFFFFF) might be faster, and is the same as (modulo x #x100000000) if x is non-negative. <ArneBab>that then actually operates on the bits… <mark_weaver>also, don't use mutation. write a proper loop in scheme style. mutable variables are slower to access. <mark_weaver>rather than taking modulo/logand after computing the subexpression ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1), instead, compute the sum of that plus the variable on the left, before taking modulo/logand. <mark_weaver>v0 and v1 are the only things that really need to be kept within range before chopping off the top bits. <mark_weaver>(the reason the top bits matter for v0 and v1 are because they are right-shifted) <ArneBab>uhm, reload that URL - I just pushed <ArneBab>I think I also have to throw away the upper bits of the result of the subexpression, because that gets substracted from v0 or v1. <ArneBab>If I don’t, I might get stray 0-values. <civodul>oh, (let ((x (values))) x) is optimized to (values), but it shouldn't <mark_weaver>in general, in any expression involving only addition, subtraction, and multiplication (which includes left shift) you can postpone the modulo until after the expression. <ArneBab>I just did a performance test of before implementing your optimizations and after it, and they improve the speed by more than factor 3 <mark_weaver>btw, that's true for any modulus, not just powers of two. <ArneBab>I have a right-shift in both expressions… but I only substract them when decrypting, not when encrypting, so I can get rid of one more modulus for encryption. <mark_weaver>you should probably use 'define-inlinable' for those helper routines like uint32 and v0change/v1change. <mark_weaver>you only need to take modulo before updating v0 and v1. <mark_weaver>those are the only variables that need to be kept in range. <mark_weaver>anyway, I have to go afk for a while. happy hacking! <ArneBab>mark_weaver: could I ask you something about wisp in a few hours? <mark_weaver>also, it would be good to get rid of all top-level variable references within the loop. <ArneBab>getting rid of the toplevel variables in the loop did not change much. <ArneBab>all in all, the optimizations bring a factor of 5 <ArneBab>the C-code still runs 100x faster, though. <ArneBab>I tried ,profile, but it always replies “no samples recorded”. <davexunit>ArneBab: too quick of an operation to sample? <ArneBab>davexunit: ah, yepp: If I just execute the code 10000 times, it works. <davexunit>yeah I did a busy loop and cranked up the loops until it could gather reasonable data: ,pr (let loop ((i 0)) (if (< i 10000000) (loop (1+ i)) i)) <ArneBab>it’s pretty impressive to have the profiler available like that <davexunit>yeah. maybe racket or some other scheme can tout better tools, but compared to most other languages guile blows them away in terms of interaction. <ArneBab>In python this would be python -m profile somescript.py <davexunit>can you profile from the python repl, though? <davexunit>of course other languages have some great profiling, too, but I just find things to be more pleasant about being able to do so many different things right from the REPL. <davexunit>that sentence didn't come out right, but you get it. <ArneBab>it’s import cProfile; cProfile.run("1 + 1") <ArneBab>1/5th of my runtime are just calls to logxor, so there’s a clear upper limit to how much faster I could get this ☺ <mark_weaver>civodul: any idea why hydra.nixos.org hasn't been doing evaluations of guile-2-0 since November? <mark_weaver>ah, it says "last checked 2014-01-09 ..., with errors!" <mark_weaver>civodul: on the guile-master job, what do you make of the "attempt to call something which is not a function but a set" errors? <civodul>see, error reporting is a difficult part of PL implementations... <ArneBab>mark_weaver: on wisp: it can now parse macros and I added bootstrapping via autotools and a minimal testsuite (just checking the output against preconverted files). Do you see things it is still missing to correctly convert all scheme syntax? <mark_weaver>civodul: you said that long ago, you and wingo had a disagreement about whether #vu8(1 2 3) should be equivalent to #u8(1 2 3). Do you remember which side of that argument you took? <mark_weaver>the reason I ask is that I need to talk to whoever thought they should be distinct. <mark_weaver>so I'm wondering if I need to try to contact wingo or not. <sneek>I last saw wingo on Dec 13 at 12:09 pm UTC, saying: that is a test of the stack :) test on loops and the difference is much more. <civodul>a bit of time without meeting him, indeed <civodul>i think there was more discussion at another point in time, but i can't find it <mark_weaver>I think I found this thread before, but couldn't easily find the disagreement you mentioned. I'll look more closely this time. <mark_weaver>civodul: having looked more closely, I still can't find the disagreement you mentioned in that thread. <mark_weaver>Andy mentions the need to do type dispatch on SRFI-4 vectors though, e.g. distinguishing u8vector?, s8vector?, etc. <civodul>"the one thing that I'm not fond of is the switch from disjoint SRFI-4 types to polymorphic types" <civodul>i was more in favor of keeping types disjoint <civodul>whereas Andy advocated something more flexible <civodul>and i think we ended up with something in the middle <civodul>so you can view any SRFI-4 vector as a bytevector <civodul>but you cannot take an u8vector as an f64vector, say <mark_weaver>yeah, I can see the rationale for distinguishing most of those element types. I just don't see why #vu8(...) should be different than #u8(...) <mark_weaver>well, the real problem I have is that #vu8(1 2 3) is not 'equal?' to #u8(1 2 3). <mark_weaver>it's a problem because R6RS specifies the #vu8(...) syntax, and R7RS specifies the #u8(...) syntax. <mark_weaver>most of the bytevector procedures we have create #vu8(...), which causes problems with R7RS. <mark_weaver>I guess I'll have to wait until I can talk to wingo about this. <mark_weaver>I have a patch that hacks 'equal?' to consider them equal, but I'm not sure it's the best fix. it seems better to just unify those two element types. <civodul>perhaps you could post that and Cc Andy, and wish him a happy new year <mark_weaver>yes, I think that's the next step. thanks for the info :) <ijp>sneek is probably full of messages