<PurpleSym>rekado: Any obstacles when adding/updating R packages that I need to be aware of? I have accumulated quite a few in custom channels that I’d like to push into guix proper.
<rekado>PurpleSym: there are a few packages on CRAN and Bioconductor that bundle massive amounts of JavaScript, usually minified and without sources.
<rekado>this is the biggest obstacle
<PurpleSym>Yeah, I wouldn’t add those to guix proper, but guix-science instead. I’ve also seen they usually bundle libraries instead of linking to them, when they provide an interface for them.
<rekado>that’s a tricky subject
<rekado>sometimes I try to unbundle, but in other cases it renders the whole point of the package moot
<PurpleSym>Do you have an example?
<rekado>r-freetypeharfbuzz, for example
<rekado>I decided to unbundle anyway, but the point of this package is to provide a particular version of freetype and harfbuzz
<civodul>uh, scary practices or clever trick?
<PurpleSym>Urgh, scary. I’d mention this unbundling in the description though.
<rekado>I don’t know what to think of it.
<rekado>mentioning it in the description sounds like a good idea
<zimoun`>rekado: I am seeing…-phastCons100way.UCSC.hg19_3.7.2.tar.gz and this is 1G of data. Is it expected to be stored on Berlin?
<rekado>zimoun`: I think it’s fine.
<zimoun`>because there is a lot of simple genomic data in the r-*
<zimoun`>and I am not convinced there are “packages”.
<rekado>I don’t think it hurts us to store these data packages.
<zimoun`>but then they are going to SWH for instance
<zimoun`>it really slows down /gnu/store when I GC, for another instance
<rekado>I think the correct response to that is to improve GC then :)
<zimoun`>the same way that “workflow” are not packages, data is not packages neither. And it should be part of this: distribute data with something else than packages
<civodul>a single file cannot "slow down GC"
<zimoun`>hehe! that’s not a single file, but a couple of packages.
<zimoun`>by couple, I mean 10-20.
<zimoun`>and I have not examined how many files each package contains
<rekado>zimoun`: I have mixed feelings about this. I don’t think it’s a great idea to host data on CRAN or Bioconductor, and the fact that it is done is merely evidence of a person with a hammer seeing nails everywhere.
<zimoun`>and they are not packages but really flat data.
<rekado>on the other hand: some data files for R are generated from generic data files with the current version of R, so they really are “built” in a way.
<rekado>and the fact that they exist as packages in CRAN and Bioconductor means that R users will want to be able to install them (rather than use some other mechanism to fetch and load them).
<zimoun`>rekado: I do not buy the argument. :-) Otherwise, it applies to Dockerfile for reproducible. «the fact that it is done».
<rekado>oh, it’s no argument
<rekado>it’s just me laying out why I don’t feel motivated to change anything about it :)
<rekado>I’m not trying to convince you of anything.
<rekado>I would need to be convinced of anything first.
<zimoun`>in my view “r-genome-hg19” is the same story as dataset for analysis.
<zimoun`>and my point is that an abstraction is missing for data. The “r-genome” ones are an excuse, somehow. :-)
<rekado>to be fair, a lot of R packages are the binary equivalent of half-assed excuses.
<rekado>r-bh comes to mind
<rekado>I often felt weird about packaging R things because some people try very hard to have this completely autonomous R world with copies of libraries because OS packaging is too difficult.
<zimoun`>I agree. I am still convinced that an abstraction for data is missing. Applying such abstratcion to R packages as annotations or experiment is another story.
<rekado>r-rcpparmadillo, r-zlibbioc, r-freetypeharfbuzz, etc
<zimoun`>rekado: commit 297531ef58 introduces 6 days ago r-circrnaprofiler at version 1.4.0 but this version is not in Bioconduction 3.12. Idem for r-chemminer. And I guess r-bioassayr.
<zimoun`>Does Bioconductor update their package between 2 releases?