<zimoun>rekado: Yes, I follow the recommandations, AFAIR. Maybe I did something wrong
<zimoun>civodul, rekado: thanks for the paper; in my reading queue. ;-)
<zimoun>about GWL, one “issue” is that “we” should sit and write real workflows
<zimoun>on the other hands, in my Insitute, people does not know what a workflow is, they discover Snakemake or CWL and them it is alomost magic. More a lot of materials is there, so they can copy/paste
<civodul>yeah, looks as if they saw GWL and thought "we can do it too!"
<civodul>would be nice to keep the lead in these domains
<zimoun>yeah I think GWL is the most advanced but it lacks a lot of polishing because none of us uses it on a daily basis. For example, it is not clear to me how big data set are/should be managed
<civodul>big data sets in the store are almost certainly a bad idea tho ;-)
<civodul>despite GWL's limitations, it's probably a better starting point than NixBio it seems
<civodul>a blog post and/or paper about it would be nice
<zimoun>yeah I did one package for one videos for the Guix Days ;-) So then, I thought that we could use Guix to distribute the “community” material (videos, papers, etc.) with a channel. Maybe a specific build-system. But then package is not the correct object. And I have not thought about it. :-)
<rekado>zimoun: I hit a roadblock with my Guile bindings to DRMAA, but I think this can be the next big step in making GWL workflows run on more clusters. For data storage we need to figure out ways to communicate with data management systems; a first target could simply be IPFS while we figure out other requirements.
<rekado>but this may be fine for well-known static data sets
<zimoun>on June, I have started to reimplement https://pubmed.ncbi.nlm.nih.gov/28419628/ a tool about HLA typing. The original code is an ugly C/C++ glued together with ugly Bash. So I have started to replace the Bash by Snakemake and I am in the process to replace the C/C++ by Python+BioPython.
<zimoun>I could switch to GWL instead of Snakemake and so it is real world example
<zimoun>Otherwise Pigx is a good candidate but it is really bigger
<civodul>GitHub, AnaConda Cloud, AWS... what else?
<zimoun>civodul: missing works only on Windows ;-)
<rekado>PiGx is a good candidate, indeed, but … it’s absolutely massive. I suppose I could do it piecemeal and start with the RNAseq pipeline.
<zimoun>rekado: maybe IPFS could ease the CAS. Or maybe git-annex (even if I am not big fan because Haskell and bootstrapability; which matters for scientific reproductibility)
<zimoun>yeah maybe only RNAseq pipeline as example good be a good candidate
<zimoun>from my point view, starting by implementing the example would provide what is missing; pragmatically speaking