IRC channel logs

<rekado_>yes, a lot of the tools they (and other cloud services) offer are orbiting the idea of container blobs

<rekado_>they offer a container registry where you can “docker push” your blobs and then fetch them from there when a machine is initialized.

<rekado_>kubernetes is the same: you define services that are backed by a container blob

<rekado_>shared file systems are difficult to implement well, so copying blobs is the preferred model.

<rekado_>I might replace the current setup (shared EFS, just like our traditional cluster setup) with a “guix pack”-based thing to copy the requested environment to new VMs

<rekado_>it will make VM initialization slower, but this might be an acceptable compromise to make Guix fast.

<PurpleSym>rekado_: Do you have any insight into how long initialization takes in the copy scenario? I found that users are very impatient when it comes to starting applications.

<rekado_>PurpleSym: it takes too long if people are impatient

<rekado_>you can always throw more money at this to transfer data more quickly, but you can’t downgrade to slower networking dynamically

<rekado_>this also has other downsides: you can’t easily use Guix on the created VM

<rekado_>with the shared EFS you can use the whole store and let Guix communicate with a remote daemon to update your environment at runtime.

<rekado_>(e.g. using the Guix kernel for Jupyter)

<rekado_> https://twitter.com/nordholmen/status/1373121559548850176 — “depends on how you create the container” is such a common response, it almost deserves a blog post

<rekado_>it reminds me of Greenspun’s tenth rule.

<rekado_>“sure, if you want reproducibility you need to cache all repositories that your Dockerfile accesses; and you can’t use an arbitrary base image that you don’t control; and ….”

<rekado_>these are things that very few people are even aware of, so the overwhelming majority of containers fail to meet even basic requirements for reproducibility

<rekado_>the water <–> ice analogy is wrong too.

<rekado_>yes, you can’t swim in ice

<rekado_>but you can thaw it

<rekado_>you can’t thaw a container blob.

<PurpleSym>“Impatient” meaning more than 2–3 seconds in my experience, btw.

<rekado_>it takes definitely longer than that

<rekado_>provisioning the VM takes more than 3 seconds already

<PurpleSym>I think O2R/ERC are more concerned about post-research publication, which is why irreversibly freezing the results is probably fine for them.

<PurpleSym>(aka “see, you can run it and get the same results”-reproducibility)

<rekado_>then it has to boot, then it needs to do all the boring stuff that turn the generic AMI into something suitable for working with (e.g. “yum update -y” “yum install -y amazon-efs-utils”, mount any shared EFS for common data, mount block storage, etc)

<rekado_>and only *then* can we copy data around

<rekado_>you could make things a little faster by preparing an EBS volume with the relevant /gnu/store subset while the EC2 instance boots, and then attach it once the EC2 instance is ready.

<rekado_>but the mere act of copying data around is slow

<rekado_>(copying stuff also depends on features that Guix currently doesn’t expose: we need something between “guix copy” and “guix pack”, a file iterator of sorts, so that we can stream files to a target without intermediate packing)

<PurpleSym>I.e. `guix pack` streaming a tar archive to stdout.

<rekado_>yes, that would work

<rekado_>for iterating on the same environment (e.g. running the thing again with an added package) you could get speed-up by listing just the files and piping the names to rsync to take care of the difference

IRC channel logs

2021-03-20.log