IRC channel logs


back to list of logs

<dsmith-work>Not I
<nalaginrut>morning guilers~
<ijp>sneek: later tell wingo if by some chance you haven't already, check out "secrets of the glasgow haskell compiler inliner"
<sneek>Will do.
<mark_weaver>ijp: thanks for pushing the peval fix. feel free to push your ,br fix also.
<mark_weaver>although actually, I'm about to push some things, so maybe wait a few minutes :)
<ijp>hmm, I thought I had, but it looks like I didn't
<mark_weaver>ijp: okay, I pushed my patches. feel free to push your ,br fix.
<ijp>okay pushed
<ijp>wow, someone is trying to get geiser to work with scsh
<mark_weaver>ijp: would you like to close the bug also?
<ijp>I will do
<civodul>Hello Guilers!
<mark_weaver>The 'r7rs-wip' branch now has a fairly complete implementation of R7RS for stable-2.0, which passes the R7RS test suite from chibi scheme. Apart from lack of docs and tests, the main remaining issue is that the modified 'write' which supports SRFI-38 is much slower.
<civodul>post a summary
<civodul>and benchmarks :-)
<mark_weaver>heh, well, printing (iota 1000000) is about 10 times slower on my machine. most of that time is spent in GC.
<mark_weaver>anyway, I'd rather wait until it's really ready. also, I'm deleting and recreating this branch periodically, so beware.
<civodul>due to a hash table or something?
<mark_weaver>yeah, it's mostly the hash table.
<civodul>anyway that's great news
<mark_weaver>I think I'll end up writing a miniature hash table implementation that does no allocation in the common case.
<mark_weaver>not ideal, but I can't think of a better way.
<mark_weaver>anyway, time for me to sleep. happy hacking, all!
*mark_weaver --> zzz
<taylanub>Hrm, if I import binding `foo', then (define bar (let ((priv-foo foo)) (lambda () (use priv-foo)))), then my `priv-foo' won't see changes in the top-level `foo', and will be better optimizable, right ?
<taylanub>If so, this is essentially "static linking"! (Of a single binding.)
<taylanub>But no, a module's body surely isn't executed at compile-time .. does that `let' take effect the first time the module is loaded, perhaps ?
*taylanub experiments
<civodul>just stumbled upon this network sniffer by rixed et al.:
<civodul>seems to use a fair amount of Guile
<ArneBab>hi jemarch
<ArneBab>Is there a way to define a variable as uint32, so bitshift works as in C?
<ArneBab>I implemented the tiny encryption scheme in guile scheme, but my implementation will be horribly slow, because the original algorithm uses properties of bitshift on uint32.
<ArneBab>references: C-code:
<ArneBab>my reimplementation:
<ArneBab>essentially what I need to do is to run (modulo number 2**32) every time I do anything which could increase the number.
<ArneBab>(need to do→currently do)
<ArneBab>and that feels horrible…
<taylanub>The fixnum API would be the right thing here, IIRC Guile has no fixnums yet ?
<taylanub>Oh, we have it in (rnrs arithmetic fixnums (6)), but no idea if it uses optimized stuff.
<civodul>it doesn't
<civodul>but just do bitshift & mask, no?
<taylanub>And so I just realized that procedures could generally be optimized at creation-time (run-time) to specialize for the types of objects which were unknown at compile-time ..
<taylanub>Not sure if that falls under JIT already, I'm thinking of the simple case where given (define (make-proc foo) (lambda () (use foo))), (make-proc <compile-time-constant>) can be optimized to return a procedure that's specialized on the type of foo, omitting type-checks and all, whereas (make-proc variable) will generally return an unspecialized procedure.
<taylanub>A procedure is generally called more often than it's created, so I wonder if it would be worthwhile to always optimize at procedure-creation (no heuristics and stuff like it's usually in JIT engines AFAIK).
<taylanub>Well leaving aside vague terms like JIT, it certainly involves run-time code generation/modification.
<taylanub>So I think it would be best after all to have a specialized "static import" that imports an immutable copy of a binding from the current bindings of a module, instead of implementing this complex run-time optimization idea.
<ArneBab>civodul: nice - thanks!
<ArneBab>taylanub, civodul: sorry for answering late
<ArneBab>taylanub: thanks!
<ArneBab>taylanub, civodul: how do I create a fixnum?
<ArneBab>(no, that’s for C)
*ArneBab is reading the docs right now
<ArneBab>the fixnums seem to be 64bit … (fixnum-width) → 61
<mark_weaver>62 bits when you include the sign bit.
<mark_weaver>but that's only on 64-bit platforms.
<ArneBab>ah, yes
<ArneBab>mark_weaver: what I search for is a way to tell scheme to treat a given number as uint32 (because an algorithm requires that)
<ArneBab>calling modulo all the time eats cycles :)
<ArneBab>besides: the manual page on elisp and nil¹, states that the page Compilation² tells me how to activate the warning in case of equality comparisons with #f, '() or nil, but on that page I do not find it. ¹: ²:
<mark_weaver>scheme doesn't have that, nor does guile.
<mark_weaver>(the uint32 thing)
<ArneBab>ok, so I’m currently limited to my workaround
<mark_weaver>'remainder' might be faster, and if the arguments is always non-negative then it's the same as modulo.
<mark_weaver>I think remainder might be a VM op, and modulo not.
<mark_weaver>no, nevermin
<mark_weaver>modulo has a VM op too.
<ArneBab>the cost I currently pay is that I have to call (modulo number (integer-expt 2 32)) on every math operation
<mark_weaver>if you're doing exponentation, then see 'modulo-expt', which is *much* faster than exponentiating and then doing modulo.
<mark_weaver>well, I hope you're precomputing the (integer-expt 2 32), right?
<mark_weaver>and preferably it's a lexical variable.
*ArneBab hides in very dark shadows… (re precomputing)
<mark_weaver>also, depending on the operations you're doing, you probably don't *need* to do the modulo after every operation. just often enough to make sure that the intermediate numbers don't get too big.
<mark_weaver>well, I'll tell you right now that precomputing 2^32 will be a *lot* faster.
<mark_weaver>that's a non-trivial operation.
<ArneBab>that definitely, yes…
<ArneBab>(likely the biggest performance leak right now - I did not actually profile the function: I wanted to see whether I can make the code more “natural” to the algorithm
<civodul>fixnums are a performance hack
<mark_weaver>anyway, if you need speed for something like uint32 math, then you should use C for that, as much as I try to avoid C these days.
<ArneBab>I just want to implement a C-based algorithm as natually as possible
<ArneBab>and the algorithm uses the properties of uint32, so I need to emulate that:
<mark_weaver>rather than precomputing, perhaps you should just put in the numeric constant itself, written as #x100000000
<ArneBab>that looks better, yes
<ArneBab>#b1000. righth?
<ArneBab>ah, no, sorry…
<ArneBab>(mixed it up)
<mark_weaver>You can compute the entire subexpression ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1) before taking the modulo.
<mark_weaver>it will always be the same, and much faster, than taking the modulo after every suboperation.
<mark_weaver>ditto for the other subexpressions that look like that.
<ArneBab>hm, yes, because the xor of the upper bits is irrelevant
<mark_weaver>(logand x #xFFFFFFFF) might be faster, and is the same as (modulo x #x100000000) if x is non-negative.
<ArneBab>that then actually operates on the bits…
<mark_weaver>also, don't use mutation. write a proper loop in scheme style. mutable variables are slower to access.
<mark_weaver>rather than taking modulo/logand after computing the subexpression ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1), instead, compute the sum of that plus the variable on the left, before taking modulo/logand.
<mark_weaver>v0 and v1 are the only things that really need to be kept within range before chopping off the top bits.
<mark_weaver>(the reason the top bits matter for v0 and v1 are because they are right-shifted)
<mark_weaver>s/are because/is because/
<ArneBab>mark_weaver: this is what I currently do:
<ArneBab>uhm, reload that URL - I just pushed
<ArneBab>I think I also have to throw away the upper bits of the result of the subexpression, because that gets substracted from v0 or v1.
<ArneBab>If I don’t, I might get stray 0-values.
<mark_weaver>I don't think so.
<civodul>oh, (let ((x (values))) x) is optimized to (values), but it shouldn't
<mark_weaver>in general, in any expression involving only addition, subtraction, and multiplication (which includes left shift) you can postpone the modulo until after the expression.
<mark_weaver>right shifts are the only problem operation here.
<ArneBab>I just did a performance test of before implementing your optimizations and after it, and they improve the speed by more than factor 3
<mark_weaver>btw, that's true for any modulus, not just powers of two.
<ArneBab>I have a right-shift in both expressions… but I only substract them when decrypting, not when encrypting, so I can get rid of one more modulus for encryption.
<mark_weaver>you should probably use 'define-inlinable' for those helper routines like uint32 and v0change/v1change.
<mark_weaver>you can do the same for decryption.
<mark_weaver>(x-y) mod N is the same as (x mod N) - (y mod N)
<mark_weaver>ditto for + and *
<mark_weaver>you only need to take modulo before updating v0 and v1.
<mark_weaver>those are the only variables that need to be kept in range.
<ArneBab>I see it, yes…
<mark_weaver>anyway, I have to go afk for a while. happy hacking!
<ArneBab>mark_weaver: could I ask you something about wisp in a few hours?
<mark_weaver>you still have (integer-expt 2 32) in the loop.
<ArneBab>missed that - thanks!
<mark_weaver>also, it would be good to get rid of all top-level variable references within the loop.
<mark_weaver>anyway, ttyl!
<mark_weaver>(yes, you can ask later)
<ArneBab>getting rid of the toplevel variables in the loop did not change much.
<ArneBab>all in all, the optimizations bring a factor of 5
<ArneBab>the C-code still runs 100x faster, though.
<ArneBab>I tried ,profile, but it always replies “no samples recorded”.
<ArneBab>To reproduce: $ guile → ,pr (+ 1 2)
<davexunit>ArneBab: too quick of an operation to sample?
<ArneBab>it takes 0.0005s or so…
<ArneBab>davexunit: ah, yepp: If I just execute the code 10000 times, it works.
<davexunit>and I learned about the ,pr command!
<ArneBab>,help profile
<ArneBab>(for more)
<davexunit>yeah I did a busy loop and cranked up the loops until it could gather reasonable data: ,pr (let loop ((i 0)) (if (< i 10000000) (loop (1+ i)) i))
<davexunit>guile never ceases to impress me.
<ArneBab>it’s pretty impressive to have the profiler available like that
<davexunit>yeah. maybe racket or some other scheme can tout better tools, but compared to most other languages guile blows them away in terms of interaction.
<ArneBab>In python this would be python -m profile
<davexunit>can you profile from the python repl, though?
<davexunit>of course other languages have some great profiling, too, but I just find things to be more pleasant about being able to do so many different things right from the REPL.
<ArneBab>yepp, you can:
<davexunit>that sentence didn't come out right, but you get it.
<ArneBab>but it’s not just ,pr (something)
<davexunit>oh, cool.
<davexunit>I'm a python fan, too.
<davexunit>I just prefer the land of lisp.
<ArneBab>it’s import cProfile;"1 + 1")
<ArneBab>I really like both, too.
<ArneBab>1/5th of my runtime are just calls to logxor, so there’s a clear upper limit to how much faster I could get this ☺
<mark_weaver>civodul: any idea why hasn't been doing evaluations of guile-2-0 since November?
<mark_weaver>ah, it says "last checked 2014-01-09 ..., with errors!"
<mark_weaver>civodul: could you take a look?
<civodul>mark_weaver: see ; i'll report it
<mark_weaver>civodul: on the guile-master job, what do you make of the "attempt to call something which is not a function but a set" errors?
<civodul>mark_weaver: oh, lemme check
<civodul>see, error reporting is a difficult part of PL implementations...
<ArneBab>mark_weaver: on wisp: it can now parse macros and I added bootstrapping via autotools and a minimal testsuite (just checking the output against preconverted files). Do you see things it is still missing to correctly convert all scheme syntax?
<ArneBab>…damn, gone…
<mark_weaver>civodul: you said that long ago, you and wingo had a disagreement about whether #vu8(1 2 3) should be equivalent to #u8(1 2 3). Do you remember which side of that argument you took?
<civodul>my side :-)
<mark_weaver>the reason I ask is that I need to talk to whoever thought they should be distinct.
<civodul>lemme see if i can find it
<mark_weaver>so I'm wondering if I need to try to contact wingo or not.
<mark_weaver>sneek: seen wingo?
<sneek>I last saw wingo on Dec 13 at 12:09 pm UTC, saying: that is a test of the stack :) test on loops and the difference is much more.
<mark_weaver>civodul: thanks!
<civodul>a bit of time without meeting him, indeed
<civodul>i think there was more discussion at another point in time, but i can't find it
<mark_weaver>I think I found this thread before, but couldn't easily find the disagreement you mentioned. I'll look more closely this time.
<mark_weaver>civodul: having looked more closely, I still can't find the disagreement you mentioned in that thread.
<civodul>maybe it was just on IRC?
<mark_weaver>Andy mentions the need to do type dispatch on SRFI-4 vectors though, e.g. distinguishing u8vector?, s8vector?, etc.
<mark_weaver>do you remember which position you took?
<civodul>"the one thing that I'm not fond of is the switch from disjoint SRFI-4 types to polymorphic types"
<civodul>i was more in favor of keeping types disjoint
<civodul>whereas Andy advocated something more flexible
<civodul>and i think we ended up with something in the middle
<civodul>so you can view any SRFI-4 vector as a bytevector
<civodul>but you cannot take an u8vector as an f64vector, say
<civodul>i think
<mark_weaver>yeah, I can see the rationale for distinguishing most of those element types. I just don't see why #vu8(...) should be different than #u8(...)
<civodul>different in what sense?
<mark_weaver>well, the real problem I have is that #vu8(1 2 3) is not 'equal?' to #u8(1 2 3).
<mark_weaver>it's a problem because R6RS specifies the #vu8(...) syntax, and R7RS specifies the #u8(...) syntax.
<mark_weaver>most of the bytevector procedures we have create #vu8(...), which causes problems with R7RS.
<mark_weaver>I guess I'll have to wait until I can talk to wingo about this.
<mark_weaver>I have a patch that hacks 'equal?' to consider them equal, but I'm not sure it's the best fix. it seems better to just unify those two element types.
<civodul>it would make sense
<civodul>perhaps you could post that and Cc Andy, and wish him a happy new year
<mark_weaver>yes, I think that's the next step. thanks for the info :)
<ijp>sneek is probably full of messages
<dsmith-work>sneek is a good boy