IRC channel logs
2020-11-23.log
back to list of logs
<rekado>the syntax guarantees that no bioinfo person will use it. <rekado>the limitations they mention are why GWL does not use the Guix store for work outputs. <civodul>so they haven't really put much thought into that aspect i guess? <civodul>then again they didn't have a GWL paper they could cite ;-) <rekado>it’s the bare minimal application of Nix in workflows <rekado>I should work on this with Roel one day <civodul>yeah it would be nice to gain more visibility <civodul>overall it's a positive move that more people are sending the same message <civodul>it's just the competition aspect that's somewhat annoying <civodul>re syntax, JSON-like with Bash interspersed has its appeal :-) <civodul>the "HPC queue integration" section is interesting <civodul>it's a gross hack (chroot disabled) that does the job <rekado>syntax-wise the GWL has wisp, which is Python-like with *any* language that can be used for the command section <zimoun>rekado: Yes, I follow the recommandations, AFAIR. Maybe I did something wrong <zimoun>civodul, rekado: thanks for the paper; in my reading queue. ;-) <zimoun>about GWL, one “issue” is that “we” should sit and write real workflows <zimoun>on the other hands, in my Insitute, people does not know what a workflow is, they discover Snakemake or CWL and them it is alomost magic. More a lot of materials is there, so they can copy/paste <zimoun>and they do not care about the reproducibility issue. They answer Docker. Idem because “fashion”. <civodul>yeah, looks as if they saw GWL and thought "we can do it too!" <civodul>would be nice to keep the lead in these domains <zimoun>yeah I think GWL is the most advanced but it lacks a lot of polishing because none of us uses it on a daily basis. For example, it is not clear to me how big data set are/should be managed <civodul>big data sets in the store are almost certainly a bad idea tho ;-) <civodul>despite GWL's limitations, it's probably a better starting point than NixBio it seems <civodul>a blog post and/or paper about it would be nice <zimoun>yeah I did one package for one videos for the Guix Days ;-) So then, I thought that we could use Guix to distribute the “community” material (videos, papers, etc.) with a channel. Maybe a specific build-system. But then package is not the correct object. And I have not thought about it. :-) <rekado>zimoun: I hit a roadblock with my Guile bindings to DRMAA, but I think this can be the next big step in making GWL workflows run on more clusters. For data storage we need to figure out ways to communicate with data management systems; a first target could simply be IPFS while we figure out other requirements. <zimoun>rekado: yeah, from my point of view, we need a good proof of concept for the data storage. But it is not clear to me what it could be; keeping in mind reproducibility (CAS somehow). <zimoun>IPFS could a good candidate. I have few (or no) experience with it, especially with data set of several Go (~10Go is more or less 1 WGS and a cohort is say ~20) <rekado>specifically “ggd predict-path” exists for workflow systems (they mention snakemake) <rekado>not so nice: it’s backed by conda <rekado>but this may be fine for well-known static data sets <zimoun>on June, I have started to reimplement https://pubmed.ncbi.nlm.nih.gov/28419628/ a tool about HLA typing. The original code is an ugly C/C++ glued together with ugly Bash. So I have started to replace the Bash by Snakemake and I am in the process to replace the C/C++ by Python+BioPython. <zimoun>I could switch to GWL instead of Snakemake and so it is real world example <zimoun>Otherwise Pigx is a good candidate but it is really bigger <civodul>GitHub, AnaConda Cloud, AWS... what else? <zimoun>civodul: missing works only on Windows ;-) <rekado>PiGx is a good candidate, indeed, but … it’s absolutely massive. I suppose I could do it piecemeal and start with the RNAseq pipeline. <zimoun>rekado: maybe IPFS could ease the CAS. Or maybe git-annex (even if I am not big fan because Haskell and bootstrapability; which matters for scientific reproductibility) <zimoun>yeah maybe only RNAseq pipeline as example good be a good candidate <zimoun>from my point view, starting by implementing the example would provide what is missing; pragmatically speaking <rekado>that’s actually how I found motivation to add a few tweaks to the DSL <rekado>because I imagined the comments some of my colleagues would throw at me if they had to “map” and “zip” and “find” and use “lambda” <zimoun>maybe we could launch a shared repo somewhere and start to commit in it, WDYT? <rekado>you could get push rights to a feature or wip branch <zimoun>currently, who have the commit rights on GWL from Savannah? <zimoun>rekado: where do I submit to ask for push right? And I guess I need a GPG key, right? <rekado>Savannah is pretty cluttered but that’s the only time you need to use the web interface :) <zimoun>Ok, I will try to do it. But I am real bad with GPG stuff… I am always forgetting my password. Anyway! :-) <rekado>things became easier for me when I learned to use full sentences <rekado>a couple of (random) words separated by spaces <zimoun>I have tried different strategies and I ends up with words on a piece of paper that remember me it; but I have to keep the piece of paper. Bootstrap problem? ;-) <rekado>my grandfather used to have pieces of paper for bank PINs, passwords, account numbers, but added a layer of secrecy by encrypting it on paper <rekado>he then used a cipher book to decipher as needed <rekado>and kept them in separate locations <zimoun>my Dad is doing something similar :-) <rekado>when he died it was very difficult to figure out all those numbers he felt important enough to encrypt. <rekado>so the lesson here is: don’t die! <zimoun>civodul: in case you are planning to publish the hpc.guix.info announce for v1.2, hope you have proofreaded the draft <civodul>zimoun: we'll see tomorrow maybe, WDYT? <zimoun>yeah, for sure. It was in case you were in the mood of publishing :-) <civodul>fun quote from the BioNix paper: "Types are implemented as an abstract data type (ADT) and are tracked using Nix’s passthru features." <civodul>Nix is dynamically typed, it's "JSON with functions" as Tweag puts it <civodul>so what they describe here is another gross hack <civodul>but i think it tells an interesting story <civodul>like, Nix means Haskell, and Haskell means types, so you've gotta say something about types :-) <zimoun>and yeah there are so many Haskell blogs about Nix, or the contrary :-) Even the official Hackage repo contains for some packages a dedicated section to Nix <civodul>heh yes, Tweag is super aggressive marketing-wise these days :-) <civodul>which i think is a good idea for them, they should have done it before because they too have a lot to brag about <civodul>but of course, they don't have release songs and cute illustrations, muahaha <zimoun>And after people are still answering Docker+Conda, pwa!