***Server sets mode: +nt
***aminb is now known as Guest47171
***hydraz_ is now known as hydraz
<lloda>I've tried a couple of libraries of mine that are public and a repo of private code but I don't have anything very useful to report :-( <lloda>lightning seems about 2x or 3x slower compared to stable-2.2 in some cases, without jit <lloda>when I tried jit on large functions, the speed was about the same <lloda>I haven't had time to try & isolate a good small benchmark <lloda>but a ton of what I do relies on array-ref or goes directly to C, so it makes sense that the jit wouldn't be able to help <wingo>that's interesting! counterintuitive as well... <wingo>you compiled lightning with -O2 I guess? <wingo>when you tested the jit version, did you also jit-compile all the (most-called) subrs that your functions called? maybe your functions also had inner loops that guile compiled as other functions <wingo>regarding the slowdown vs stable-2.2, i would expect some due to master/lightning having more bytecodes in general <wingo>for me array-sum is slower without jit (1.6s vs 1.4s) but when jitted (and importantly, when array-length and array-ref are jitted) it's a little faster (1.2s) ***rekado_ is now known as rekado
<lloda>wingo: my results on that test are stable-2.2 ~ 0.19 to 0.26, lightning no jit 0.66 to 0.73, lightning jit 0.66 to 0.72 :-/ <wingo>lloda: for your array-sum test you mean? <wingo>how big of an f32vector are you using? <lloda>I'm using f32vector-ref in the loop instead of array-ref actually <lloda>have meetings :-/ but I'll be around <wingo>for guile 2.2.3 i still get like 1.2s here <wingo>could be that CSE patch that mark weaver found, applied in 2.2.4 but i thought it wasn't needed on master <wingo>certainly if 2.2 manages to unbox but master doesn't, that's a bug <wingo>that would certainly account for the difference <wingo>indeed, that seems to be what's happening <wingo>hum, that's not it, i think i was testing with the wrong version <wingo>lloda: my results on that test are stable-2.2 0.44s, lightning no jit .77-.88s, lightning jit .54s <wingo>in stable-2.2, the body of the loop is 15 instructions <wingo>in lightning, it's 25 instructions, and has two callouts to intrinsics (add/immediate and lsh/immediate), which in 2.2 are dedicated opcodes <wingo>the lightning branch also has 3 side exits for error conditions that are explicit control flow, which stable-2.2 doesn't have <wingo>inter-instruction state is kept on the stack. i guess the additional instructions cause more stack traffic. probably the icache footprint is a little higher too, though who knows; array-sum compiles to a little more than 2k of code, which probably should be slimmed a little <wingo>still, icache misses probably aren't the thing. <lloda>those numbers are similar to what I get <lloda>I did compile both lightning & stable-2.2 with -O2. I used gcc 8.2 for lightning and 8.1 for stable <lloda>in the larger test I did I only jit-compiled a big function that defined small functions inside (in a let). I'll try to jit-compile the small functions instead <outtabwz>What do i put in my ~/.guile to make the auto-compiler silent except for errors? Alternatively, how to completely disable ac in ~/.guile? <lloda>I don't know how to disable the ac messages. One way is to make sure everything is compiled first <lloda>,a compile or ,a load also reveal some variables that you can try ***mood_ is now known as mood
<outtabwz>I thought maybe ,option interp #t but it seems that only applies to the REPL, and not scripts invoked from the shell :( <outtabwz>Putting (setenv "GUILE_AUTO_COMPILE" "0") into ~/.guile doesn't prevent auto-compile of scripts invoked from the shell either :( ***geokon1 is now known as geokon
<wingo>outtabwz: (set! %load-should-auto-compile #f) <wingo>but better if possible to simply compile the files ahead of time <stis>wingo: procedure-properties is awfully slow in 2.2! <wingo>stis: probably because all the data is stored statically in the elf in a side table; it has to grovel through the DWARF for that data <wingo>lloda: we should be able to really speed up array-length btw <wingo>i just stepped through it, it does a bunch of useless crap <stis>ok, if it's not improving I will move over to hash tables. <wingo>stis: i think that's a reasonable thing to do <stis>wingo: what's your thought on optimizing the familly of the kind (call-with-values expr (case-lambda ((x) x) (x x))) <stis>e.g. have a speed version for 1 value return <stis>I use this to simulate python asignment semantics regarding multiple value returns <stis>but it really is slow 1 million operations per second which is an overhead for me because asignments is everywhere <stis>it should be able to optimize (call-with-values (lambda () (+ number number)) ...) to (+ number number) <wingo>how would you distinguish returning '(1 2) from returning (values 1 2) ? <wingo>stis: do you need to support 0-valued returns? <stis>well get better speed for 1 value return, multiple values is turned into a tuple, and I gues that this construct can be don quite effectively, especially for the common one value return. Of cause zero becomes the empty list <stis>which I did not think of, so to avoid that perhaps use (case-lambda ((x) x) ((x . l) (cons x . l))) <stis>in python x=... return 1,2 => x=(1,2), in guile I would expect x=1 <OrangeShark>stis: a multi value return in python is just a tuple, isn't it? <wingo>lloda: embarrassingly, returns from interpreter -> jit weren't actually working <wingo>it was staying in the interpreter <stis>OrangeShark: yep but scheme uses multiple value return and I want to interoperate <wingo>lloda: hence the 0 speedup until all leaf functions are compiled! <stis>it's possible now, but a bit too inneficient in my taste, especially since it's an optimization away to get good bytecode <stis>maybe i bite the bullet and just implement it as tupples <amz3>stis: yes, that's what I was going to say. <amz3>stis: you want to implement a fast python on top of guile vm? <stis>first of all, I want good interoperability and a nice scheme interface. Then speed. <stis>now it's 50x slower then cython and I could perhaps improve that with a factor of 10 if I could get this device working smoothely <stis>(set! x (wrap expr)), wrap is the device and is (lambda (x) (call-with-values (lambda () x) (case-lambda ((x) x) (x x))))) <stis>Currently it will make a closure and a lambda and then call a primitive with those values . Would be nice with some better rules of oprmisation for this case <amz3>why do you compare with cython in particular? do you have python library in mind that you would like to be able to use? <stis>I compiled the random module from cython and run one of the tests there just to see how it performs <stis>note the random.py module <wingo>i still don't know how you plan to distinguish returning '(1 2) from (values 1 2) <stis>you don't, it' a projection of the two concepts, either as a multiple return values you return (values 1 2) or (list 1 2) both are identical in pythonbyt not in scheme so the expectation of the pythonist would not be broken <stis>still if you in python return values they would in scheme land be destinguished as not a tuple <wingo>lloda: interestingly, inlining the fast path of "<?" in the VM gets a good perf boost... i wonder if we will have to go back to inlining fast paths for a number of things (both in interpreter and jit) <wingo>stis: in that case i think your best speed bet is (call-with-values foo (lambda (x . x*) ...)) <wingo>it will be fast in the 1-value case, and for multiple values you can test if x* is null <stis>and compilation is also faster <lloda>for array-length I think we only need to check the type, if it's an actual array we don't need the handle, and if it's a xxxxvector I don't think we need the handle either if all the types put the length in the same place <lloda>but I think all of that is eventually going to Scheme, so... <wingo>btw i found a large part of the issue <wingo>for that array-sum test, previous numbers were: <wingo>stable-2.2 0.44s, lightning no jit .77-.88s, lightning jit .54s <wingo>numbers now lightning no jit 0.60s, lightning jit 0.32s <wingo>was a problem in the intrinsics, they were missing a fast path <lloda>will the jit handle a case-lambda, or should I jit-compile each of the cases? <lloda>my timings for the array-sum tests are now stable-2.2 0.18, lightning no jit 0.39, lightning jit 0.11 <wingo>lloda: it will handle the case-lambda <wingo>0.39 vs 0.18 is not nice, should try to improve that; it is possible though that interpreted throughput can be less than 2.2 <wingo>yay, prompts and aborts compile well <outtabwz>wingo: Still getting the "auto-compilation" notice even with (set! %load-should-autocompile #f) in ~/.guile <stis>my little whishlist, let/ec compiles to just gotos in case of (let/ec ret (if p? (ret 1)) (ret 2))) <wingo>stis: need a compiler pass to do that, to contify prompts <wingo>condition is that the prompt tag is fresh and that it doesn't escape the procedure it's in <outtabwz>wingo: guile (GNU Guile) 2.2.4 built from source yesterday <wingo>outtabwz: i think that's a bug then, please send a mail to bug-guile@gnu.org <wingo>alternately of course you can rebind the current warning port, if you just want the messages to go away <wingo>(current-warning-port (%make-void-port "r")) <wingo>(current-warning-port (%make-void-port "w")) <outtabwz>wingo: I don't want to disable all warnings. I just don't want the verbose chatter every time I update my program. <rekado>hmm, just found out that mailutils won’t help me process a multipart message. <rekado>it can get me the parts and tell me that this is a multipart message, but I think I could do this without mailutils.