IRC channel logs

<apteryx>amz3: I didn't know much about Smalltalk; reading on it, it is pretty impressive!

***heroux_ is now known as heroux

<apteryx>is there no function provided by guile out-of-the-box to rename a directory?

***Server sets mode: +nt

***unCork is now known as Cork

<lloda>wingo: morning

<lloda>the segfault is gone

<lloda>otoh it takes a very long time to compile an empty file

<lloda>it's rather strange

<lloda>with THRESHOLD=0

<lloda>any other number is way faster

***rekado_ is now known as rekado

<wingo>moin

<wingo>doing a simple GUILE_JIT_COUNTER_THRESHOLD=1000 GUILE_JIT_LOG_LEVEL=1 meta/guile

<wingo>i see 89 functions compiled

<wingo>84% of which are compiled at their entry point

<wingo>and the rest tier up from loops

<civodul>hey wingo

<wingo>greets

<civodul>so it's possible that a loop within a function is compiled, but not the whole function, right?

<wingo>no

<wingo>a function has a single optimization counter

<wingo>that counter is incremented on function entry and on loop iterations

<wingo>so either could trigger compilation

<civodul>ok

<wingo>when a function is jit-compiled, the whole thing is compiled, even if it was a loop iteration that triggered the compilation

<wingo>if it was a loop iteration, the function will tier up into the middle of the compiled code, instead of the beginning

<civodul>so the whole function is compiled, but the trigger might be a loop entry

<civodul>i should really give it a try

<wingo>indeed! the default in the lightning branch is no jit compilation

<wingo>so you can just kick off a compile to have it around

<wingo>make sure to remove all .go files in that dir tho

<wingo>the binary format was changing without .go abi bumps

<wingo>then when you want to jit-compile, you can do it via the temporarily-available %jit-compile function

<wingo>or you can set GUILE_JIT_COUNTER_THRESHOLD=NNNN

<civodul>ok

<wingo>where NNNN is the counter threshold above which a function gets compiled

<wingo>set GUILE_JIT_LOG_LEVEL=N to see debug output

<wingo>for N in {1,2,3}

<civodul>i see

<civodul>what fraction of bytecode instructions can be compiled?

<wingo>civodul: all of them!

<wingo>even aborts, continuations, reinstating, etc

<civodul>woow!

<wingo>neat, right? :)

<civodul>crazy even! :-)

<wingo>:)

<civodul>so it's where we can start playing with it and see what happens?

<wingo>it's starting to get there, yes. for small examples :)

<wingo>if you set GUILE_JIT_COUNTER_THRESHOLD=0 then *everything* gets compiled

<wingo>takes a while of course

<wingo>as compiling a function takes longer than running it

<civodul>heh

<civodul>at least you can see if it crashes ;-)

<wingo>yes :)

<wingo>i meant to say, i haven't yet gotten the whole test suite to run through. there were some stalls, and i just fixed an important crasher yesterday

<wingo>but it's definitely at the point where if you have a test case, you can see what will happen

<wingo>not time to try it out on whole apps tho

<wingo>also, there seem to be a couple of places where the interpreter is noticeably slower than 2.2 -- like more than 20 or 30% slower. it's related to changes in the bytecode and such, most of them are fixable i think, but if you find a case like that, please lmk so i can have a poke

<wingo>it's somewhat expected to have some cases in the interpreter be slower than 2.2, as 3.0 has generally more instructions than 2.2, because the instructions do less work

<wingo>but it shouldn't be significant, and the JIT should always more than make up the difference

<wingo>fwiw i am seeing startup time slightly less with 3.0 than 2.2 so that's good anyway

<wingo>keeping startup time fast is something i'm keeping an eye on

<civodul>ok

<civodul>slower interpreter could mean longer bootstrap times tho ;-)

<wingo>mmm, not sure

<wingo>by interpreter i mean vm-engine.c

<wingo>i haven't tested the relative speed of eval.scm

<wingo>but i would imagine that JIT-compiled eval.scm should handily beat 2.2's eval.scm

<wingo>haven't tested yet tho :)

<civodul>oh, i was thinking of eval.scm

<wingo>right, i think we're going to have to start calling that the evaluator!

<wingo>dunno

<wingo>more parts, more names, more confusion :)

<civodul>heheh

<civodul>gcc 8 shows interesting warnings:

<civodul>srfi-14.c:2061:42: warning: '%06x' directive writing between 6 and 8 bytes into a region of size 7 [-Wformat-overflow=]

<wingo>neat

<civodul>a bit worrying

<civodul>anyway it's building :-)

<wingo>:)

<civodul>wingo: messages like "jit: vcode: start=0xe30e60,+6 entry=+0" mean that we jumped to jitted code?

<wingo>civodul: no that means that jit code was emitted

<wingo>usually the code will be entered also right after the message

<wingo>but to see all entries and exits, set log level to 3

<civodul>ok

<civodul>that's more verbose indeed :-)

<wingo>:)

<wingo>functions that are called a lot are called a lot :)

<civodul>at the REPL we just see too many of them

<wingo>yeah, better to run at log level 1 generally, if you are interested in when jit happens

<wingo>otherwise pipe stderr somewhere

<wingo>"entry=+0" indicates that it was a function call that triggered JIT

<wingo>nonzero indicates that it was a loop

<civodul>ok, nice

<civodul>if i do: GUILE_JIT_LOG_LEVEL=1 GUILE_JIT_COUNTER_THRESHOLD=1000 ./meta/guile -c '(define (fib i)(if (<= i 1) 1 (+ i (fib (- i 1))))) (fib 42)'

<civodul>presumably only the last few lines correspond to my procedure, right?

<wingo>civodul: probably, check the vcode compared to the addresses from ,x fib

<wingo>civodul: i think in that case actually `fib` is interpreted

<wingo>right?

<wingo>also that's not fib :)

<wingo>civodul: for fib, on lightning, without jit: using the evaluator, (fib 35) takes 10.9s. Using the compiler, 1.05s.

<wingo>with jit, using the evaluator, (fib 35) takes 6.47s, and with the compiler, 0.61s.

<wingo>by way of comparison, with 2.2 and the evaluator, 9.2s, and with the compiler, 0.88s.

<wingo>still some headroom in lightning i think tho

<wingo>if "fib" itself is letrec-bound instead of bound at the top level, 2.2's compiler is 0.78s, lightning's compiler is 0.92s, lightning compiler+jit is 0.51s

<wingo>so, broadly similar results

<wingo>incidentally i don't how how ecraven's benchmark got 40s for this test

<wingo>he did (fib 40) but that result is 8s for me with 2.2, or 10s lightning, or 5.2s lightning+jit

<ecraven>I should rerun everything sometime soon

<ecraven>it's a fast machine, maybe?

<wingo>not really

<ecraven>ah, 40, not 4.. not sure

<wingo>ah sorry didn't mean to neg your machine, i was thinking you were talking about mine :)

<wingo>anyway for 2.2 i would expect something a bit better, but who knows

<wingo>hopefully we can get a 3.0 prerel out soon that will be on par with racket :)

<wingo>ecraven: i was wrong! your tests run fib 40 5 times

<wingo>so the numbers were right

<wingo>irritatingly :)

<ecraven>ok, that's good

<ecraven>I'll do a new run over the next few days, there were some new releases

<civodul>wingo: re fib, interesting! though i guess fib conses a lot (bignums) so it may not be a great benchmark

<civodul>in case someone had a doubt ;-)

<civodul>it's interesting to see to letrec binding helps this much already in 2.2

<wingo>the letrec binding means it can use call-label instead of call

<wingo>and no closure needed either

<wingo>(fib 40) isn't a bignum fwiw

<wingo>so it's "just" call overhead

<civodul>oh right

<wingo>chez's results on those benchmarks are offensive

<cmaloney>offensive in a good way?

<wingo>disgustingly good

<wingo>there is no way we can win those benchmarks unless we get some sort of top-level inlining story

<wingo>maybe civodul will let me do https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html one day

<wingo>it's clear that the mbrot results are only possible having proved that many variables are inexact reals

<wingo>and you can't do that without inlining

<civodul>wingo: IMO we should definitely work towards intra-module inlining

<civodul>that needs a transition path: could be issuing warnings for occurrences of "@@" in user code, adding 'declare' statements to explicitly turn inlining on and off, etc.

*wingo nod

<wingo>we'd need more of an idea about what a module is of course :P

<wingo>i.e. when do you know that you've seen all of a module

<civodul>oh yes, that's terrible

<civodul>i've come to love R6' 'library' form

<wingo>:)

<wingo>the good news from taking a look at the r7rs benchmarks is that the jit reduces run-time on those benchmarks by 40-60% generally, relative to 2.2

<wingo>the bad news is that there's still a way to go before we eat chez's lunch :)

<wingo>e.g. the peval benchmark for me goes from 20s -> 11.5s

<wingo>that's with GUILE_JIT_COUNTER_THRESHOLD=1000

<wingo>if i could golf that reduction down to the 60-80% range that would make me happy

<wingo>to run on your own, clone https://github.com/ecraven/r7rs-benchmarks

<wingo>and run via

<wingo>GUILE_JIT_COUNTER_THRESHOLD=1000 GUILD=/opt/guile/bin/guild GUILE=/opt/guile/bin/guile ./bench guile peval

<wingo>with jit we are still generally 3x slower than racket. probably a lot related to inlining, some related to code generation

<wingo>civodul: intra-module contification would also be nice

<civodul>wingo: 40%-60% is good news, indeed!

<civodul>how does Guile 3 compare to 2.2 without JIT?

<wingo>civodul: see my notes from 13:07

*wingo pastes to not have to scroll up

<wingo><wingo> civodul: for fib, on lightning, without jit: using the evaluator, (fib 35) takes 10.9s. Using the compiler, 1.05s.

<wingo><wingo> with jit, using the evaluator, (fib 35) takes 6.47s, and with the compiler, 0.61s.

<wingo><wingo> by way of comparison, with 2.2 and the evaluator, 9.2s, and with the compiler, 0.88s.

<wingo>that's fairly representative. will see if i can golf the interpreted perf a bit

<wingo>i.e. compiler without jit

<civodul>oh right

<civodul>i wasn't sure if this was representative

<wingo>there are some bits of low-hanging fruit to investigate

<wingo>will need to profile

IRC channel logs

2018-09-04.log