IRC channel logs
2021-09-20.log
back to list of logs
<PurpleSym>civodul: 99% of all wheels are actually just Python source files. pytorch might be a bit of an extreme example. <civodul>PurpleSym: hi! is it 99% or is that a guess? :-) <civodul>but yes, i guess "a lot of them" are source <civodul>perhaps i should clarify that in the text? <PurpleSym>civodul: “From experience” :) I don’t think there’s need to clarify in the text, because wheel describes itself as binary package format, so technically you’re right. But people might nit-pick like I just dit. <zimoun>civodul: I am reading your PyTorch article. From a scientific audience, the argument about build-from-source is not security (personally I do not care my Krylov HPC linear solver is not secure) but verificability. All the layers have to be scrutinize to check there is no “bug“; then it becomes a new knowledge that we trust. How Science should work. :-) <civodul>oh right, that one is also worth referring to :-) <civodul>you think the article insists too much on security? <zimoun>not too much. The paragraph before ‘Bundling’ should insist on why “build-from-source” is important for “scientific research” (these day named Reproducible Science, arf!) <zimoun>again at bundling, keeping things auditable should be the main argument; from a researcher POV. The other ones are details. Important though (-:) <rekado>I think mentioning security is fine. It *is* a concern that triggers an emotional response in a wider audience. Nobody wants to run “insecure” software. <rekado>unfortunately, the response also can be shut down by going “well, actually…” on threat models and things like that <rekado>“how likely is it that pytorch takes over my secrets?” <rekado>the reproducible research angle is a bit more niche, but it also has a moral or philosophical component that cannot easily be brushed away <civodul>right, a scientist can hardly argue that they don't care <civodul>these days, those who don't care have to be silent <rekado>or they need to claim that it is just exploratory and that for “real” research they’ll do it better… <civodul>i think this kind of post i just about moving the norm by a few mm <zimoun>Somehow, my point is to answer: all this work is worth because otherwise it is not truly Scientific Research but Engineering or Craft or Cooking or Whatever. But based on which principles can we trust the knowledge built on the top of it? <zimoun>«Engineering does not require science. Science helps <zimoun>a lot but people built perfectly good brick walls long before they knew why <zimoun>Here, it is somwhow the converse. :-) <zimoun>Guix is applying scientific method (transparency) to package management. Pip & CONDA, not. <zimoun>rekado: mentioning security is fine, for sure. But security-minded people are already convinced that pip&conda are bad things. So this post is not trying to convince them, I guess. Instead, the post is trying to convince regular scientific people. And they listen argument about science, transparency and co; I hope. :-) <civodul>cluster sysadmins are also interested in security i think <civodul>like the three of us, more or less :-) <civodul>(though i hope this post makes sense to more than just the three of us!) <zimoun>yeah yeah, my comment is not to remove all mention about security, but to switch the argument: transparency because 1. sciencitific requires it and 2. security requires it too. <rekado>zimoun: sounds reasonable to me. <rekado>“security-minded people” is a bit vague, though. Our cluster admins for example care about security (e.g. they will aim to apply system patches in time and restrict permissions for services and people), but they don’t see anything wrong with people using Conda or even opaque container blobs. <rekado>getting the message across, that it probably *should* make them feel a little uncomfortable that transparency is missing would be an improvement. <zimoun>rekado: yeah I agree. To me, the main question: who is the main audience? A post cannot speak to all audience in the same time. Sadly. <rekado>on the other hand… it may also work against us, because until recently it was *not* normal for users to install software without involving a sysadmin. We’re almost taking this power shift for granted. <rekado>I think the security question should be one that is closely entangled with the scientific endeavor. <rekado>I mean: it’s not primarily a concern for sysadmins, but for scientists and research practitioners. <zimoun>rekado: I agree that it is the 2 parts of a same coin. <civodul>right, in the end these are several facets of the same problem <rekado>if you have bioinfo data that corresponds to potentially sensitive medical information you really cannot afford to trust in intransparent software. <zimoun>rekado: sadly it is not how it works in France, at least. <rekado>at the MDC we had two separate clusters on two separate networks in separate data centres — one for patient data (collaborations with the Charite’ hospitals) and the other for everything else. <rekado>the second cluster now belongs to a separate institute that we cooperate with. <rekado>ironically it is the MDC that uses Guix (processing “everything else”), but not the Berlin Institute of Health (processing patient data). <zimoun>we have too. But a lot of medical stuff use proprietary software. <rekado>(last I heard they used EasyBuild, because Python > Scheme or something inane like that) <zimoun>From my point of view, today, the issue is not to convince sysadmin because they install (as best as they can) more or less what the researchers need. If not, research bypass them and more than ugly stuff happens. However, scientific people are not aware that they do not apply scientific method to their numerical stack. And I think they have to be convinced with concrete examples as PyTorch. :-) <zimoun>Well, thanks for the discussion. It motivates me to write down a presentation I did 2 weeks ago. :-) <civodul>i think i'm done with the post, i'll publish it this afternoon <rekado>excellent! Looking forward to sharing it. <civodul>i realized the OpenCL code is in contrib/ and it's evidently broken <civodul>hmm i thought it was free, but maybe not <PurpleSym>civodul: There’s individual LICENSE files in the subdirectories. <civodul>it's probably something that can't be built from source, but hey, that's modernity <PurpleSym>We have a node importer in one of the wip-* branches that works quite well actually. Worth a shot as soon as your wounds from packaging pytorch are healed :) <civodul>we should get that importer in 'master' <zimoun>which could be tweaked to have a “workadventure” <zimoun>if work is not already a great adventure ;-) <zimoun>civodul: how did you produce the graph of dependencies for PyTorch? <zimoun>BTW, cool! for the nice read :-) <zimoun>civodul: I mean libreadventure is a fork of workadventure <civodul>zimoun: re the graph: "guix graph -M2 python-pytorch | fdp -Tsvg > graph.svg" <zimoun>civodul: ah, it is ’fdp’ that I did not know. Thanks. <zimoun>For the new ’-M2’, I already had something like that :-) <civodul>i did not but thought it was long overdue!