IRC channel logs

<PurpleSym>rekado: We’re thinking about how to provide the entire CRAN repository to our Guix users. One idea was to just build an automated Guix channel, which imports the entire CRAN on a regular basis. Any thoughts?

***civodul` is now known as civodul

<civodul>PurpleSym: i have a dream (might be a nightmare actually) of importing things on the fly

<civodul>like "guix shell cran:whatever"

<civodul>though rekado always reminds me that importers are imperfect and that this would fail in many cases

<civodul>i guess it would work for pure R packages, which is already something

<PurpleSym>Sure, that's another option, civodul. What are the pros and cons of either way? I guess with a channel we could use the time machine to travel.

<PurpleSym>(And it's okay if it works 90% of the time.)

<civodul>true

<civodul>the generated channel is probably easier to implement

<civodul>like there's pretty much no additional development to be done

<civodul>it's just less fancy, less "elegant"

<PurpleSym>Correct, guix import would do the heavy lifting either way.

<PurpleSym>I'm slightly worried about having multiple versions of the same package in different sources (proper+channel).

<rekado>PurpleSym: I’m doing this kind of thing with guix.install

<rekado> https://cran.r-project.org/web/packages/guix.install/index.html

<rekado>it uses guix import when a package isn’t available yet, adds a user module ~/.Rguix to the GUIX_PACKAGE_PATH, and then installs it from there.

<zimoun>PurpleSym: many CRAN and Bioconductor are already in Guix proprer.

<zimoun>A massive import could work for CRAN because the metadata is more or less clean and so the guix import cran works well, IMHO.

<zimoun>rekado wrote a tiny script for listing the missing Bioconductor packages, then it is easy to feed ‘guix import cran -a bioconductor’.

<rekado>yes, CRAN import works pretty well, which is why I was confident enough to publish guix.install()

<rekado>I see this in my working directory: https://elephly.net/paste/1664292780.scm.html

<rekado>and this for bioc: https://elephly.net/paste/1664292821.scm.html

<rekado>not sure if these are useful

<rekado>I’m having a problem with the texlive-fonts-map hook — it doesn’t run even though texlive-base exists in the profile

<rekado>I’d love to see all CRAN packages in Guix

<rekado>it’s not a lot of work, but it’s tedious.

<rekado>it also makes bulk updates a little more involved — but that’s my problem

<rekado>I have the same wish for texlive — getting all those packages into Guix would be lovely

<rekado>and automatically check them all for completeness

<rekado>I think I found a bug relating to profile hooks when substituting derivations

<rekado>still need to gather details but the effect is real: one profile builds the font maps hook, the other does not.

<rekado>the manifests of these profiles are almost exactly the same (only difference is provenance info), but the output of ‘guix gc -R’ on the profile derivations is vastly different

<PurpleSym>rekado: I’ve seen guix.install, but it does not fit our workflow, I believe. (We don’t install packages using R, but using Guix itself via a manifest.scm.)

<PurpleSym>I’d be nice if we could just have all CRAN packages in Guix, but the descriptions are probably not good enough for an automated import, right?

<PurpleSym>Thus my idea to “circumvent” this quality control via a separate opt-in channel.

<rekado>or we could improve our tools to make them better automatically

<rekado>a lot of CRAN descriptions use incomplete sentences; we can detect and fix that.

<rekado>some packages come with bundled JS that needs fixing; we can detect that.

<PurpleSym>Sure, I’m all for improving imports.

<rekado>the CRAN importer is exceptionally good, in my opinion; it wouldn’t take much to make it good enough to reduce the necessary adjustment work to zero.

<PurpleSym>It’s very good, indeed.

<PurpleSym>Let me build a list of CRAN packages tomorrow and then we’ll see how far we are in terms of coverage.

<rekado>there are thousands of CRAN packages; I think we have about 1.5k R packages in total.

<rekado>when things are less busy for me I’d like to continue packaging it all.

<rekado>upgrades are a lot of work, though. (See r-dt.)

<PurpleSym>We have 2075 packages using r-build-system and there’s 18730 packages on CRAN. Still a long way to go.

<rekado>I think last I tried I gave up because of performance problems when stuffing them all in cran.scm :)

<rekado>but that was long ago

<PurpleSym>Regarding packaging every single package on CRAN: There’s also the question whether this is desireable at all. If we automate most of this there’ll be no sanity checks regarding the package’s contents at all. No-one is going to look at the diffs when updating, etc.

<rekado>what kind of sanity checks?

<rekado>or rather: what kind of sanity checks that cannot be automated?

<PurpleSym>The Python world had all sorts of weird credential stealing malware in PyPi. Not sure how bad CRAN is in that regard.

<PurpleSym>So, simply put: Do we trust these repositories?

<rekado>I don’t trust Bioconductor

<rekado>but CRAN is pretty good

<rekado>CRAN has reviews on every package update AFAIK

<rekado>Bioconductor has pretty lax license declarations, which is reason enough to be vigilant

<zimoun>I think that roughly double the total number of packages in Guix (today 20k, all CRAN 18k), the performance of “guix pull” will be drastically slower. Idem for “guix time-machine”.

<zimoun>I have never timed, but at best the performance (time) is linear with the number of packages.

<zimoun>And the constant (slope) is already really poor with some hardware.

<zimoun>Therefore, I am all for it to try it!

<zimoun>It would allow to spot out some issus about the scaling up.

<rekado>hah, I did not expect this conclusion after the “therefore” :)

<zimoun>I cannot offer to help this massive importer (proprer or channel), but I offer to becnhmark the result. :-)

<rekado>I’ll dust off my importer script and see if we can get something that could be stuffed into a channel.

<PurpleSym>For benchmarking a synthetic channel with controllable variables (like number of packages, packages per file, …) would be more meaningful.

<efraim>I was thinking of adding a simple checker to the gnu-build-system on core-updates that just searches the unpacked source for instances of *.min.js and spits out a warning when it sees one

<efraim>similar to the one in the python-build-system and cythonized code

<efraim>^^ in relation to minimized javascript in cran packages

IRC channel logs

2022-09-27.log