IRC channel logs

2021-01-22.log

back to list of logs

<rekado>I just added support for a top-level declaration of required packages
<rekado>so a workflow file stat begins with (require-packages "foo" "bar" "baz")
<rekado>…will be evaluated in an ad-hoc environment containing these extra packages
<rekado>(the packages are looked up in the current guix inferior)
<rekado>this actually works, so I’m pretty happy with it :)
<rekado>oof, compiling the workflow down to scripts is pretty slow now. I’ll probably have to memoize inferior package lookups and group derivations and all that.
<rekado_>zimoun: prompted by performance problems I just realized what always bothered me: GWL process templates are compiled to one script per instantiated process
<rekado_>instead there should be *one* script per template which is invoked with different settings (one settings expression per instantiated process)
<rekado_>with inferiors everything is quite a bit slower than before and with my test workflow the culprit is immediately obvious: we’re generating hundreds of scripts that are almost all identical
<rekado_>I guess this means that I’ll work on this intermediate representation of a workflow sooner than I thought: each process will be compiled to a script (shared by process templates) and a settings alist. This should speed things up greatly.
<zimoun>rekado_: ah, “memoize” is not enough?
<rekado_>no.
<rekado_>I probably also don’t do things efficiently enough, but regenerating almost the exact same thing over and over surely plays a part in that.
<zimoun>I do not really have an idea about what these scripts look like.
<zimoun>it is a problem of IO; writing on disk, right?
<rekado_>the problem is talking to the inferior and the daemon.
<rekado_>generating the scripts requires some communication over the inferior and the daemon, and that’s not very fast
<rekado_>it’s also rather silly to only ever execute scripts without any arguments when we could easily reuse them by passing arguments instead
<rekado_>the “payload” of each process is either a code snippet, an S-expression, or a G-expression.
<rekado_>code snippets are “compiled” to simple strings, but whatever variables they refer to are embedded in the string
<rekado_>this is why different instantiations of the same process template result in many different scripts — the payload differs in the embedded arguments
<rekado_>so instead of embedding the arguments we can simply externalize them and let code snippets reference them from an external expression.
<zimoun>you mean, it generates many many scripts from strings and these scripts are are then “compiled“. I miss concretly details because why this “compilation” requires inferior and daemon? This should happen at “run-time”, not “compile-time”.
<rekado_>sorry, I’m using the term “compile” loosely
<rekado_>when you run a workflow, it first evaluates the workflow file, which leads to a bunch of process values and a workflow value; those need to be converted into something that can actually be executed, which is done by process->script
<zimoun>I see, generate all these gwl-*.scm is costly, right?
<zimoun>yeah, IIUC, ’process->script’ should be called once and generates a “template”, then ’make-script’ should fill this template. That’s you mean, right? Somehow.
<rekado_>zimoun: right now you see something like this: https://elephly.net/paste/1611308536.html
<rekado_>there’s one gwl-trim_galore_pe.scm for every process made from a template trim_galore_pe
<rekado_>contents are almost exactly the same — except that they operate on a different sample to trim
<rekado_>instead of calling all these *different* scripts without arguments we should just call *one* script with different arguments
<rekado_>no need to generate the same script more than once; generating a script also involves looking up packages and ensuring that they are built, so that’s really something that should be avoided (even though the store caches things, talking to the store costs time).
<zimoun>yes. What I miss, using your example, is you would like 3 scripts (1 per name) or only one script?
<rekado_>instead of calling “/bin/sh -c /gnu/store/aaaa…-gwl-trim_galore_pe.scm”, then “/bin/sh -c /gnu/store/bbbb…-gwl-trim_galore_pe.scm”, etc, I want “/bin/sh -c /gnu/store/just-one…-gwl-trim_galore_pe.scm '(configuration aaaa)'”, “/bin/sh -c /gnu/store/just-one…-gwl-trim_galore_pe.scm '(configuration bbbb)'”
<rekado_>etc.
<rekado_>where '(configuration …)' is an S-expression containing all values that the script references by name.
<rekado_>so we only compute /gnu/store/just-one…-gwl-trim_galore_pe.scm once and then reuse it with all the different configurations — one per actual process.
<rekado_>I guess it’ll be clearer to see in the finished code :)
<zimoun>Now, it is clear. :-) And I am guessing that “trim_galore_pe” is a function and the 6 processes are generated by this function, right?
<zimoun>how to distinguish between 2 processes? The one really different, as trim_galore_pe and salmon_index, and the same with different arguments as trim_galore_pe. Based on process name?
<rekado_>the difference is all their inputs, i.e. package inputs, the procedure (code snippet or s-expr), data inputs, etc.
<rekado_>I just found another bottleneck when generating many scripts. The cache needs to know what outputs relate to what chain of processes, so it currently hashes the actual script files.
<rekado_>the more scripts the more hashing
<zimoun>Script would extract the “functional” part of the process, right?
<zimoun>yeah, but these hashes should be fast, aren’t they?
<rekado_>here’s an example to illustrate. This is a wrapper script to run the actual script in a container: https://elephly.net/paste/1611313495.scm.html
<rekado_>and here’s an actual script: https://elephly.net/paste/1611313538.scm.html
<rekado_>(can be run in a container or not)
<rekado_>close to the bottom is the requested work to be performed: running htseq-count
<rekado_>ah, bad example, because it’s not from a process template…
<rekado_>here’s a better one: https://elephly.net/paste/1611313798.scm.html
<rekado_>this is where it comes from: https://elephly.net/paste/1611313820.w.html
<rekado_>now imagine dozens of these that only differ in the sample name
<rekado_>the script above was generated with “sample” bound to “HBR_Rep1”
<rekado_>if you’ve got 20 samples you’ll get 20 scripts that only differ in that one string.
<zimoun>yeah, the “process” is a “function“ taking the ’sample’ argument, right? And “script” is this function applied to the argument, concretly. And instead you would like that the script should still be a “function” taking one argument, for instance “HBR_blabla“.
<zimoun>Now, I totally understand your analogy with compilation for scripts. :-)
<zimoun>Somehow, for one process applied to N samples, script is nouw a command and the (part of) workflow is ‘command1; command2; command3; …; commandN’ as with simple imperative language. And you would like instead that script is a function, i.e., the workflow becomes a sequence of calls: ’script args1; script args2; script args3; etc.’. Yeah, it totally makes sense. :-)
<rekado_>yes, that’s it
<zimoun>and it scales really poorly. I am mean with 100 hundred samples… Maybe generating these scripts could be longer than running them. ;-)
*zimoun has to go. See you!
<rekado_>that’s exactly what I’ve been observing. It took a very long time to generate the scripts before *anything* was even run.
<Guest96465>I am back
***Guest96465 is now known as zimoun
<zimoun>Sorry, I have mess up with IRC. :-)
<civodul>PurpleSym: do you happen to know of a nice notebook example involving matplotlib and (say) numpy?
<PurpleSym>Jupyter you mean? Not really, but binder instances usually have a gallery like this one, which may help find one: https://notebooks.gesis.org/gallery/
<civodul>ah cool, thanks!
<PurpleSym>(You’ll also find alot of notebooks that do *not* work any more…)
<civodul>interesting :-)
<zimoun>civodul: more gallery examples here https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks
<civodul>oh that one looks even nicer, thanks!
<rekado_>hmm, so packages>
<rekado_>hmm, so packages->manifest is pretty slow when it works with inferior packages
<rekado_>I notice that inferior-eval-with-store creates a new temp directory and a connects to a temp socket each time it is called
<civodul>true
<zimoun>rekado_: each process are run in an inferior, right?
<civodul>are you able to identify what's taking time?
<zimoun>civodul: ah, let me guess… you are preparing Jupyter TPs. :-)
***zimoun` is now known as zimoun
<rekado_>civodul: yes, it’s inferior-package->manifest-entry whose results are not cached.
<rekado_>here’s a test case: https://elephly.net/paste/1611343824.scm.html
<rekado_>we see the same packages being converted to manifest entries again and again
<rekado_>so I guess I just need to memoize things here
<rekado_>has to wait until after the baby has fallen asleep
***rekado_ is now known as rekado
<civodul>rekado: oh i see, it must be the propagated input traversal, which itself involves back-and-forth communication with the inferior
<civodul>you could try this: https://web.fdn.fr/~lcourtes/pastebin/inferior-memoization.html
<civodul>it's a bit too much, and perhaps you need to memoize inferior-package-native-search-paths as well
<rekado>hmm, there’s no impact on the execution duration of my example. (Still around 14 seconds.)
<rekado>I’ll try memoizing inferior-package-native-search-paths
<rekado>also no effect, but hacking around in inferior-package->manifest-entry does something, but perhaps I broke something in the process :)
<rekado>here’s what I have: https://elephly.net/paste/1611349633.patch.html
<civodul>so... what does (statprof) say? :-)
<rekado>this at the bottom: https://elephly.net/paste/1611350294.html
<rekado>I’m not sure if I actually changed the expected semantics by doing this
<zimoun>pfiouf! Some evening, I should cross 7 times my fingers before asking a question :-)
<zimoun>rekado: the slowness, is it not communication with the socket, so memoize does not change the game. Inferior is waiting for socket, something like that?
<rekado>zimoun: looks like the slowness is due to the fact that manifest entries are recomputed again and again as propagated-inputs are traversed.
<rekado>with my ugly patch the time to execute the reproducer code drops from 14+ seconds to sub-1 second.
<rekado>I’m now building a new Guix from my commit and will then proceed to build the GWL to see if that’s enough to speed things up.
<rekado>we would need to update the ‘guix’ package in (gnu packages package-management) so that GWL installations elsewhere can benefit from this fix.
<rekado>luckily the number of GWL installations out there is almost certainly less than 5.
<zimoun>with your patch adding the vhash in inferior-package_.manifest-entry, right? Your are memoizing by hand isn’t it?
<rekado>ye
<rekado>yes
<rekado>it’s not pretty, but I can do pretty later.
<rekado>… it doesn’t seem to work correctly
<rekado>oh well
<rekado>somehow I seem to be forced to rebuild the world with this Guix
*zimoun says hum?! and is thinking...
<rekado>I’m too tired to think
<civodul>rekado: it's 14 seconds wall-clock time, but IIUC what statprof reports, it's .1s CPU time no?
<civodul>meaning we're mostly waiting for the inferior
<civodul>it might shed some light to do "perf timechart" on this
<zimoun>civodul, yes, that’s my assumption too: waiting for communicating with the socket.