<rekado>I just added support for a top-level declaration of required packages
<rekado>so a workflow file stat begins with (require-packages "foo" "bar" "baz")
<rekado>…will be evaluated in an ad-hoc environment containing these extra packages
<rekado>(the packages are looked up in the current guix inferior)
<rekado>this actually works, so I’m pretty happy with it :)
<rekado>oof, compiling the workflow down to scripts is pretty slow now. I’ll probably have to memoize inferior package lookups and group derivations and all that.
<rekado_>zimoun: prompted by performance problems I just realized what always bothered me: GWL process templates are compiled to one script per instantiated process
<rekado_>instead there should be *one* script per template which is invoked with different settings (one settings expression per instantiated process)
<rekado_>with inferiors everything is quite a bit slower than before and with my test workflow the culprit is immediately obvious: we’re generating hundreds of scripts that are almost all identical
<rekado_>I guess this means that I’ll work on this intermediate representation of a workflow sooner than I thought: each process will be compiled to a script (shared by process templates) and a settings alist. This should speed things up greatly.
<rekado_>I probably also don’t do things efficiently enough, but regenerating almost the exact same thing over and over surely plays a part in that.
<zimoun>I do not really have an idea about what these scripts look like.
<zimoun>it is a problem of IO; writing on disk, right?
<rekado_>the problem is talking to the inferior and the daemon.
<rekado_>generating the scripts requires some communication over the inferior and the daemon, and that’s not very fast
<rekado_>it’s also rather silly to only ever execute scripts without any arguments when we could easily reuse them by passing arguments instead
<rekado_>the “payload” of each process is either a code snippet, an S-expression, or a G-expression.
<rekado_>code snippets are “compiled” to simple strings, but whatever variables they refer to are embedded in the string
<rekado_>this is why different instantiations of the same process template result in many different scripts — the payload differs in the embedded arguments
<rekado_>so instead of embedding the arguments we can simply externalize them and let code snippets reference them from an external expression.
<zimoun>you mean, it generates many many scripts from strings and these scripts are are then “compiled“. I miss concretly details because why this “compilation” requires inferior and daemon? This should happen at “run-time”, not “compile-time”.
<rekado_>sorry, I’m using the term “compile” loosely
<rekado_>when you run a workflow, it first evaluates the workflow file, which leads to a bunch of process values and a workflow value; those need to be converted into something that can actually be executed, which is done by process->script
<zimoun>I see, generate all these gwl-*.scm is costly, right?
<zimoun>yeah, IIUC, ’process->script’ should be called once and generates a “template”, then ’make-script’ should fill this template. That’s you mean, right? Somehow.
<rekado_>there’s one gwl-trim_galore_pe.scm for every process made from a template trim_galore_pe
<rekado_>contents are almost exactly the same — except that they operate on a different sample to trim
<rekado_>instead of calling all these *different* scripts without arguments we should just call *one* script with different arguments
<rekado_>no need to generate the same script more than once; generating a script also involves looking up packages and ensuring that they are built, so that’s really something that should be avoided (even though the store caches things, talking to the store costs time).
<zimoun>yes. What I miss, using your example, is you would like 3 scripts (1 per name) or only one script?
<rekado_>instead of calling “/bin/sh -c /gnu/store/aaaa…-gwl-trim_galore_pe.scm”, then “/bin/sh -c /gnu/store/bbbb…-gwl-trim_galore_pe.scm”, etc, I want “/bin/sh -c /gnu/store/just-one…-gwl-trim_galore_pe.scm '(configuration aaaa)'”, “/bin/sh -c /gnu/store/just-one…-gwl-trim_galore_pe.scm '(configuration bbbb)'”
<rekado_>now imagine dozens of these that only differ in the sample name
<rekado_>the script above was generated with “sample” bound to “HBR_Rep1”
<rekado_>if you’ve got 20 samples you’ll get 20 scripts that only differ in that one string.
<zimoun>yeah, the “process” is a “function“ taking the ’sample’ argument, right? And “script” is this function applied to the argument, concretly. And instead you would like that the script should still be a “function” taking one argument, for instance “HBR_blabla“.
<zimoun>Now, I totally understand your analogy with compilation for scripts. :-)
<zimoun>Somehow, for one process applied to N samples, script is nouw a command and the (part of) workflow is ‘command1; command2; command3; …; commandN’ as with simple imperative language. And you would like instead that script is a function, i.e., the workflow becomes a sequence of calls: ’script args1; script args2; script args3; etc.’. Yeah, it totally makes sense. :-)