IRC channel logs

2022-09-27.log

back to list of logs

<PurpleSym>rekado: We’re thinking about how to provide the entire CRAN repository to our Guix users. One idea was to just build an automated Guix channel, which imports the entire CRAN on a regular basis. Any thoughts?
***civodul` is now known as civodul
<civodul>PurpleSym: i have a dream (might be a nightmare actually) of importing things on the fly
<civodul>like "guix shell cran:whatever"
<civodul>though rekado always reminds me that importers are imperfect and that this would fail in many cases
<civodul>i guess it would work for pure R packages, which is already something
<PurpleSym>Sure, that's another option, civodul. What are the pros and cons of either way? I guess with a channel we could use the time machine to travel.
<PurpleSym>(And it's okay if it works 90% of the time.)
<civodul>true
<civodul>the generated channel is probably easier to implement
<civodul>like there's pretty much no additional development to be done
<civodul>it's just less fancy, less "elegant"
<PurpleSym>Correct, guix import would do the heavy lifting either way.
<PurpleSym>I'm slightly worried about having multiple versions of the same package in different sources (proper+channel).
<rekado>PurpleSym: I’m doing this kind of thing with guix.install
<rekado> https://cran.r-project.org/web/packages/guix.install/index.html
<rekado>it uses guix import when a package isn’t available yet, adds a user module ~/.Rguix to the GUIX_PACKAGE_PATH, and then installs it from there.
<zimoun>PurpleSym: many CRAN and Bioconductor are already in Guix proprer.
<zimoun>A massive import could work for CRAN because the metadata is more or less clean and so the guix import cran works well, IMHO.
<zimoun>rekado wrote a tiny script for listing the missing Bioconductor packages, then it is easy to feed ‘guix import cran -a bioconductor’.
<rekado>yes, CRAN import works pretty well, which is why I was confident enough to publish guix.install()
<rekado>I see this in my working directory: https://elephly.net/paste/1664292780.scm.html
<rekado>and this for bioc: https://elephly.net/paste/1664292821.scm.html
<rekado>not sure if these are useful
<rekado>I’m having a problem with the texlive-fonts-map hook — it doesn’t run even though texlive-base exists in the profile
<rekado>I’d love to see all CRAN packages in Guix
<rekado>it’s not a lot of work, but it’s tedious.
<rekado>it also makes bulk updates a little more involved — but that’s my problem
<rekado>I have the same wish for texlive — getting all those packages into Guix would be lovely
<rekado>and automatically check them all for completeness
<rekado>I think I found a bug relating to profile hooks when substituting derivations
<rekado>still need to gather details but the effect is real: one profile builds the font maps hook, the other does not.
<rekado>the manifests of these profiles are almost exactly the same (only difference is provenance info), but the output of ‘guix gc -R’ on the profile derivations is vastly different
<PurpleSym>rekado: I’ve seen guix.install, but it does not fit our workflow, I believe. (We don’t install packages using R, but using Guix itself via a manifest.scm.)
<PurpleSym>I’d be nice if we could just have all CRAN packages in Guix, but the descriptions are probably not good enough for an automated import, right?
<PurpleSym>Thus my idea to “circumvent” this quality control via a separate opt-in channel.
<rekado>or we could improve our tools to make them better automatically
<rekado>a lot of CRAN descriptions use incomplete sentences; we can detect and fix that.
<rekado>some packages come with bundled JS that needs fixing; we can detect that.
<PurpleSym>Sure, I’m all for improving imports.
<rekado>the CRAN importer is exceptionally good, in my opinion; it wouldn’t take much to make it good enough to reduce the necessary adjustment work to zero.
<PurpleSym>It’s very good, indeed.
<PurpleSym>Let me build a list of CRAN packages tomorrow and then we’ll see how far we are in terms of coverage.
<rekado>there are thousands of CRAN packages; I think we have about 1.5k R packages in total.
<rekado>when things are less busy for me I’d like to continue packaging it all.
<rekado>upgrades are a lot of work, though. (See r-dt.)
<PurpleSym>We have 2075 packages using r-build-system and there’s 18730 packages on CRAN. Still a long way to go.
<rekado>I think last I tried I gave up because of performance problems when stuffing them all in cran.scm :)
<rekado>but that was long ago
<PurpleSym>Regarding packaging every single package on CRAN: There’s also the question whether this is desireable at all. If we automate most of this there’ll be no sanity checks regarding the package’s contents at all. No-one is going to look at the diffs when updating, etc.
<rekado>what kind of sanity checks?
<rekado>or rather: what kind of sanity checks that cannot be automated?
<PurpleSym>The Python world had all sorts of weird credential stealing malware in PyPi. Not sure how bad CRAN is in that regard.
<PurpleSym>So, simply put: Do we trust these repositories?
<rekado>I don’t trust Bioconductor
<rekado>but CRAN is pretty good
<rekado>CRAN has reviews on every package update AFAIK
<rekado>Bioconductor has pretty lax license declarations, which is reason enough to be vigilant
<zimoun>I think that roughly double the total number of packages in Guix (today 20k, all CRAN 18k), the performance of “guix pull” will be drastically slower. Idem for “guix time-machine”.
<zimoun>I have never timed, but at best the performance (time) is linear with the number of packages.
<zimoun>And the constant (slope) is already really poor with some hardware.
<zimoun>Therefore, I am all for it to try it!
<zimoun>It would allow to spot out some issus about the scaling up.
<rekado>hah, I did not expect this conclusion after the “therefore” :)
<zimoun>I cannot offer to help this massive importer (proprer or channel), but I offer to becnhmark the result. :-)
<rekado>I’ll dust off my importer script and see if we can get something that could be stuffed into a channel.
<PurpleSym>For benchmarking a synthetic channel with controllable variables (like number of packages, packages per file, …) would be more meaningful.
<efraim>I was thinking of adding a simple checker to the gnu-build-system on core-updates that just searches the unpacked source for instances of *.min.js and spits out a warning when it sees one
<efraim>similar to the one in the python-build-system and cythonized code
<efraim>^^ in relation to minimized javascript in cran packages