IRC channel logs


back to list of logs

<ArneBab-work>Hi, I implemented skynet with Guile Scheme Fibers, but I’m surprised that they are much slower than I expected (factor 100 slower than a serial solution). Do you see what I did wrong? wingo? Here’s my implementation: - to test: git clone; cd skynet/guile-fibers/; ./skynet.scm
<wingo>ArneBab-work: afaiu you are measuring only overhead, not work; dunno
<wingo>a message send/receive is more expensive than a procedure call :)
<ArneBab-work>wingo: Yes, this is an overhead test, but via Skynet I can compare it to other systems. And I wonder whether we can get the overhead down. I profiled it and it spends most of its time in current-dynamic-state, run-fiber, hashv-set!, and the lambda in create-fiber.
<ArneBab-work>wingo: if you want to profile it youself: git clone; cd skynet/guile-fibers/; guile -L . # (import (skynet)) \\n ,profile (main #t)
<wingo>ACTION doesn't really have time, sorry :/
<ArneBab-work>wingo: no probs - just wanted to ask and pass it to you so you can look if you’d like to. Thank you for your answers so far! (I now have hope that I did not do an obvious mistake in implementation)
<ArneBab-work>short summary of the result: The runtime with Guile fibers is around 6000 ms on this system, with Go or Java Quasar Fibers its around 300ms, in Haskell it seems to vary by factor 100 depending on the hardware (between awesomely fast and two times the speed of Guile fibers - on my machine it has 2x the speed of Guile fibers).
<wingo>20x slower than systems with native code generation and one-shot delimited continuations rather than multi-shot and also better gc is not so bad imo
<wingo>i am sure there are improvements to fibers that can make it faster, then native codegen will get faster still, then we'll still be a bit slower than go i think for this particular workload
<ArneBab-work>wingo: on the upside: Guile fibers do run 1 million fibers which communicate via 6 level deep hierarchy via channels and combine all their results with an overhead of 6 seconds. Experimentally I see that if each of your 1 million fibers does as little as (apply + (iota 1000)), you get 50% speedup over the serial code.
<wingo>did you try your skynet example while restricting guile to a single thread?
<wingo>just out of curiosity
<ArneBab-work>wingo: do you mean via #:parallelism 1 or via taskset?
<wingo>perf with taskset will be slightly less because the gc can't run in parallel, but both constrain fibers to a single core
<wingo>which is what i was curious about
<ArneBab-work>as a sidenote: for spawn-fiber I use #:parallel? {level < 2} in there, so all the "subfibers" below level 2 should stay on their CPU (if I got that right)
<manumanumanu>Good morning!
<ArneBab-work>wingo: results:
<wingo>ArneBab-work: what about the less-workload case ? i.e. without the apply and iota
<wingo>ArneBab-work: regarding #:parallel?, that just controls initial thread placement. a fiber may be stolen by another thread fwiw
<civodul>wingo: i'm confused about the two web servers in Fibers: one does its own 'run-fibers', and the other one creates threads
<civodul>i don't see which one is best for use in an already-fiberized program
<civodul>what are your thoughts?
<wingo>civodul: depends :) see
<wingo>one of them threads states through all handlers -- the handler calls themselves are serialized
<wingo>like with the m-word
<wingo>the other one runs handlers in parallel
<civodul>heh :-)
<civodul>the concurrent web server looks like what i want, but it does run-fibers
<civodul>and i'm already doing that
<civodul>so i'm unsure if this is the right thing to do
<civodul>ideally i'd write (spawn-fiber (lambda () (run-server-that-creates-fibers)))
<civodul>or at least that's what i expected :-)
<wingo>yes i see what you mean
<wingo>probably need to provide an interface that does not include the run-fibers
<ArneBab-work>wingo: without parallelism, pure overhead:
<ArneBab-work>#:parallelism 1
<civodul>bah, now my code gets stuck in a loop closing clock_nanosleep
<civodul>i wonder where that comes from
<manumanumanu>ArneBab-work: what are you working on?
<ArneBab-work>manumanumanu: I read up on Fibers for Java (for work) and ported a benchmark to Guile fibers to get a feeling for the performance:
<civodul>wingo: re barriers, i suppose bad things happen if a fiber gets preempted while it's in a custom-binary-port method, right?
<civodul>hopefully #:hz 0 should prevent that
<wingo>civodul: i can't think what problem would happen
<wingo>even if they were preemptible i don't know what would go wrong -- note tho that custom binary ports aren't preemptible afaiu
<wingo>i could be wrong of course
<civodul>continuations captured while you're in a custom port method cannot be resumed
<civodul>but you could still get the preemption signal while you're there, leading to an abort-to-prompt that cannot be resumed, no?
<dsmith-work>Hey Hi Howdy, Guilers
<amz3>someone a few years back, sent me a logo for a project that we were discussing caled GNU prime (an rss aggregator, that never came to be)
<amz3>does that person already is around?
<amz3>I am looking for a logo for gnunet...
<amz3>at least make a proposition for a new GNUnet logo based on that work
<wingo>civodul: if the continuation can't be resumed then no abort is made
<wingo>civodul: the interrupt handler checks suspendable-continuation? from (ice-9 control)
<wingo>and if the continuation is not suspendable then it does nothing
<civodul>wingo: ah, good
<civodul>thanks for checking
<manumanumanu>So, people: how would you write a high-level wrapper for the crypto_generichash_init/update/final in ?
<manumanumanu>I thought about using a custom port, but close-port returns only #t or #f
<manumanumanu>the next option is a closure with a procedure that hashes as long as you provie it with bytevectors, but that is a strange solution
<manumanumanu>at least, I think it is... icky
<manumanumanu>maybe the best way is to simply let the user keep track of the state
<amz3>manumanumanu: it's similar to industria API, here's an example use