IRC channel logs

2018-08-29.log

back to list of logs

***Server sets mode: +nt
***aminb is now known as Guest47171
***hydraz_ is now known as hydraz
<lloda>wingo: morning
<lloda>I've tried a couple of libraries of mine that are public and a repo of private code but I don't have anything very useful to report :-(
<lloda>lightning seems about 2x or 3x slower compared to stable-2.2 in some cases, without jit
<lloda>when I tried jit on large functions, the speed was about the same
<lloda>I haven't had time to try & isolate a good small benchmark
<lloda>no more crashes though
<lloda>but a ton of what I do relies on array-ref or goes directly to C, so it makes sense that the jit wouldn't be able to help
<wingo>lloda: morning :)
<wingo>that's interesting! counterintuitive as well...
<wingo>you compiled lightning with -O2 I guess?
<wingo>when you tested the jit version, did you also jit-compile all the (most-called) subrs that your functions called? maybe your functions also had inner loops that guile compiled as other functions
<wingo>regarding the slowdown vs stable-2.2, i would expect some due to master/lightning having more bytecodes in general
<wingo>but not 2x or 3x!
<wingo>for me array-sum is slower without jit (1.6s vs 1.4s) but when jitted (and importantly, when array-length and array-ref are jitted) it's a little faster (1.2s)
***rekado_ is now known as rekado
<lloda>wingo: my results on that test are stable-2.2 ~ 0.19 to 0.26, lightning no jit 0.66 to 0.73, lightning jit 0.66 to 0.72 :-/
<wingo>lloda: for your array-sum test you mean?
<lloda>yes
<wingo>how big of an f32vector are you using?
<lloda>#e1e7
<lloda>I'm using f32vector-ref in the loop instead of array-ref actually
<wingo>ah :)
<lloda>have meetings :-/ but I'll be around
<wingo>for guile 2.2.3 i still get like 1.2s here
<wingo>could be that CSE patch that mark weaver found, applied in 2.2.4 but i thought it wasn't needed on master
<wingo>certainly if 2.2 manages to unbox but master doesn't, that's a bug
<wingo>that would certainly account for the difference
<wingo>indeed, that seems to be what's happening
<wingo>yay
<wingo>hum, that's not it, i think i was testing with the wrong version
<wingo>lloda: my results on that test are stable-2.2 0.44s, lightning no jit .77-.88s, lightning jit .54s
<wingo>in stable-2.2, the body of the loop is 15 instructions
<wingo>in lightning, it's 25 instructions, and has two callouts to intrinsics (add/immediate and lsh/immediate), which in 2.2 are dedicated opcodes
<wingo>the lightning branch also has 3 side exits for error conditions that are explicit control flow, which stable-2.2 doesn't have
<wingo>inter-instruction state is kept on the stack. i guess the additional instructions cause more stack traffic. probably the icache footprint is a little higher too, though who knows; array-sum compiles to a little more than 2k of code, which probably should be slimmed a little
<wingo>still, icache misses probably aren't the thing.
<lloda>those numbers are similar to what I get
<lloda>I did compile both lightning & stable-2.2 with -O2. I used gcc 8.2 for lightning and 8.1 for stable
<lloda>in the larger test I did I only jit-compiled a big function that defined small functions inside (in a let). I'll try to jit-compile the small functions instead
<outtabwz>What do i put in my ~/.guile to make the auto-compiler silent except for errors? Alternatively, how to completely disable ac in ~/.guile?
<lloda>outtabwz: to disable ac look here https://www.gnu.org/software/guile/manual/html_node/Compilation.html
<lloda>I don't know how to disable the ac messages. One way is to make sure everything is compiled first
<lloda>,a compile or ,a load also reveal some variables that you can try
<outtabwz>lloda: Thanks. I'm reading up now.
<amz3>o/
***mood_ is now known as mood
<outtabwz>I thought maybe ,option interp #t but it seems that only applies to the REPL, and not scripts invoked from the shell :(
<outtabwz>Putting (setenv "GUILE_AUTO_COMPILE" "0") into ~/.guile doesn't prevent auto-compile of scripts invoked from the shell either :(
***geokon1 is now known as geokon
<wingo>outtabwz: (set! %load-should-auto-compile #f)
<wingo>but better if possible to simply compile the files ahead of time
<stis>wingo: procedure-properties is awfully slow in 2.2!
<stis>why's that?
<wingo>stis: probably because all the data is stored statically in the elf in a side table; it has to grovel through the DWARF for that data
<wingo>lloda: we should be able to really speed up array-length btw
<wingo>i just stepped through it, it does a bunch of useless crap
<stis>ok, if it's not improving I will move over to hash tables.
<wingo>stis: i think that's a reasonable thing to do
<stis>wingo: what's your thought on optimizing the familly of the kind (call-with-values expr (case-lambda ((x) x) (x x)))
<stis>e.g. have a speed version for 1 value return
<stis>I use this to simulate python asignment semantics regarding multiple value returns
<stis>but it really is slow 1 million operations per second which is an overhead for me because asignments is everywhere
<stis>it should be able to optimize (call-with-values (lambda () (+ number number)) ...) to (+ number number)
<wingo>how would you distinguish returning '(1 2) from returning (values 1 2) ?
<wingo>stis: do you need to support 0-valued returns?
<stis>well get better speed for 1 value return, multiple values is turned into a tuple, and I gues that this construct can be don quite effectively, especially for the common one value return. Of cause zero becomes the empty list
<stis>which I did not think of, so to avoid that perhaps use (case-lambda ((x) x) ((x . l) (cons x . l)))
<stis>in python x=... return 1,2 => x=(1,2), in guile I would expect x=1
<OrangeShark>stis: a multi value return in python is just a tuple, isn't it?
<wingo>lloda: embarrassingly, returns from interpreter -> jit weren't actually working
<wingo>it was staying in the interpreter
<stis>OrangeShark: yep but scheme uses multiple value return and I want to interoperate
<wingo>lloda: hence the 0 speedup until all leaf functions are compiled!
<stis>it's possible now, but a bit too inneficient in my taste, especially since it's an optimization away to get good bytecode
<stis>maybe i bite the bullet and just implement it as tupples
<amz3>stis: yes, that's what I was going to say.
<amz3>stis: you want to implement a fast python on top of guile vm?
<amz3>fast/faster
<stis>first of all, I want good interoperability and a nice scheme interface. Then speed.
<stis>now it's 50x slower then cython and I could perhaps improve that with a factor of 10 if I could get this device working smoothely
<amz3>device?
<amz3>what device?
<stis>(set! x (wrap expr)), wrap is the device and is (lambda (x) (call-with-values (lambda () x) (case-lambda ((x) x) (x x)))))
<stis>Currently it will make a closure and a lambda and then call a primitive with those values . Would be nice with some better rules of oprmisation for this case
<amz3>why do you compare with cython in particular? do you have python library in mind that you would like to be able to use?
<stis>I compiled the random module from cython and run one of the tests there just to see how it performs
<amz3>hmm ok
<stis>note the random.py module
<wingo>i still don't know how you plan to distinguish returning '(1 2) from (values 1 2)
<stis>you don't, it' a projection of the two concepts, either as a multiple return values you return (values 1 2) or (list 1 2) both are identical in pythonbyt not in scheme so the expectation of the pythonist would not be broken
<stis>still if you in python return values they would in scheme land be destinguished as not a tuple
<wingo>lloda: interestingly, inlining the fast path of "<?" in the VM gets a good perf boost... i wonder if we will have to go back to inlining fast paths for a number of things (both in interpreter and jit)
<wingo>stis: in that case i think your best speed bet is (call-with-values foo (lambda (x . x*) ...))
<wingo>it will be fast in the 1-value case, and for multiple values you can test if x* is null
<amz3>sneek_: botsnack
<stis>i'll try
<stis>yep 3x faster
<stis>no sorry 5x faster
<stis>and compilation is also faster
<wingo>nice
<lloda>wingo: cool stuff
<lloda>for array-length I think we only need to check the type, if it's an actual array we don't need the handle, and if it's a xxxxvector I don't think we need the handle either if all the types put the length in the same place
<lloda>but I think all of that is eventually going to Scheme, so...
<wingo>yeah
<wingo>btw i found a large part of the issue
<wingo>for that array-sum test, previous numbers were:
<wingo>stable-2.2 0.44s, lightning no jit .77-.88s, lightning jit .54s
<wingo>numbers now lightning no jit 0.60s, lightning jit 0.32s
<wingo>just pushed that fix
<wingo>was a problem in the intrinsics, they were missing a fast path
<lloda>will the jit handle a case-lambda, or should I jit-compile each of the cases?
<lloda>my timings for the array-sum tests are now stable-2.2 0.18, lightning no jit 0.39, lightning jit 0.11
<wingo>lloda: it will handle the case-lambda
<wingo>0.39 vs 0.18 is not nice, should try to improve that; it is possible though that interpreted throughput can be less than 2.2
<wingo>at least 0.11 is better :P
<wingo>yay, prompts and aborts compile well
<janneke>wow!
<stis>cool
<outtabwz>wingo: Still getting the "auto-compilation" notice even with (set! %load-should-autocompile #f) in ~/.guile
<outtabwz>wingo: https://paste.ubuntu.com/p/zDWjJQ3P4P/
<stis>my little whishlist, let/ec compiles to just gotos in case of (let/ec ret (if p? (ret 1)) (ret 2)))
<wingo>stis: need a compiler pass to do that, to contify prompts
<wingo>condition is that the prompt tag is fresh and that it doesn't escape the procedure it's in
<outtabwz>wingo: guile (GNU Guile) 2.2.4 built from source yesterday
<wingo>outtabwz: i think that's a bug then, please send a mail to bug-guile@gnu.org
<wingo>alternately of course you can rebind the current warning port, if you just want the messages to go away
<wingo>(current-warning-port (%make-void-port "r"))
<wingo>er
<wingo>(current-warning-port (%make-void-port "w"))
<outtabwz>wingo: I don't want to disable all warnings. I just don't want the verbose chatter every time I update my program.
<rekado>hmm, just found out that mailutils won’t help me process a multipart message.
<rekado>it can get me the parts and tell me that this is a multipart message, but I think I could do this without mailutils.
<civodul>it's disappointing