<PurpleSym>civodul: 99% of all wheels are actually just Python source files. pytorch might be a bit of an extreme example.
<civodul>PurpleSym: hi! is it 99% or is that a guess? :-)
<civodul>but yes, i guess "a lot of them" are source
<civodul>perhaps i should clarify that in the text?
<PurpleSym>civodul: “From experience” :) I don’t think there’s need to clarify in the text, because wheel describes itself as binary package format, so technically you’re right. But people might nit-pick like I just dit.
<zimoun>civodul: I am reading your PyTorch article. From a scientific audience, the argument about build-from-source is not security (personally I do not care my Krylov HPC linear solver is not secure) but verificability. All the layers have to be scrutinize to check there is no “bug“; then it becomes a new knowledge that we trust. How Science should work. :-)
<zimoun>Somehow, my point is to answer: all this work is worth because otherwise it is not truly Scientific Research but Engineering or Craft or Cooking or Whatever. But based on which principles can we trust the knowledge built on the top of it?
<zimoun>«Engineering does not require science. Science helps
<zimoun>a lot but people built perfectly good brick walls long before they knew why
<zimoun>rekado: mentioning security is fine, for sure. But security-minded people are already convinced that pip&conda are bad things. So this post is not trying to convince them, I guess. Instead, the post is trying to convince regular scientific people. And they listen argument about science, transparency and co; I hope. :-)
<civodul>cluster sysadmins are also interested in security i think
<rekado>“security-minded people” is a bit vague, though. Our cluster admins for example care about security (e.g. they will aim to apply system patches in time and restrict permissions for services and people), but they don’t see anything wrong with people using Conda or even opaque container blobs.
<rekado>getting the message across, that it probably *should* make them feel a little uncomfortable that transparency is missing would be an improvement.
<zimoun>rekado: yeah I agree. To me, the main question: who is the main audience? A post cannot speak to all audience in the same time. Sadly.
<rekado>on the other hand… it may also work against us, because until recently it was *not* normal for users to install software without involving a sysadmin. We’re almost taking this power shift for granted.
<rekado>I think the security question should be one that is closely entangled with the scientific endeavor.
<rekado>I mean: it’s not primarily a concern for sysadmins, but for scientists and research practitioners.
<zimoun>rekado: I agree that it is the 2 parts of a same coin.
<civodul>right, in the end these are several facets of the same problem
<rekado>if you have bioinfo data that corresponds to potentially sensitive medical information you really cannot afford to trust in intransparent software.
<zimoun>rekado: sadly it is not how it works in France, at least.
<rekado>at the MDC we had two separate clusters on two separate networks in separate data centres — one for patient data (collaborations with the Charite’ hospitals) and the other for everything else.
<rekado>the second cluster now belongs to a separate institute that we cooperate with.
<rekado>ironically it is the MDC that uses Guix (processing “everything else”), but not the Berlin Institute of Health (processing patient data).
<zimoun>we have too. But a lot of medical stuff use proprietary software.
<rekado>(last I heard they used EasyBuild, because Python > Scheme or something inane like that)
<zimoun>From my point of view, today, the issue is not to convince sysadmin because they install (as best as they can) more or less what the researchers need. If not, research bypass them and more than ugly stuff happens. However, scientific people are not aware that they do not apply scientific method to their numerical stack. And I think they have to be convinced with concrete examples as PyTorch. :-)