IRC channel logs

2021-03-20.log

back to list of logs

<rekado_>yes, a lot of the tools they (and other cloud services) offer are orbiting the idea of container blobs
<rekado_>they offer a container registry where you can “docker push” your blobs and then fetch them from there when a machine is initialized.
<rekado_>kubernetes is the same: you define services that are backed by a container blob
<rekado_>shared file systems are difficult to implement well, so copying blobs is the preferred model.
<rekado_>I might replace the current setup (shared EFS, just like our traditional cluster setup) with a “guix pack”-based thing to copy the requested environment to new VMs
<rekado_>it will make VM initialization slower, but this might be an acceptable compromise to make Guix fast.
<PurpleSym>rekado_: Do you have any insight into how long initialization takes in the copy scenario? I found that users are very impatient when it comes to starting applications.
<rekado_>PurpleSym: it takes too long if people are impatient
<rekado_>you can always throw more money at this to transfer data more quickly, but you can’t downgrade to slower networking dynamically
<rekado_>this also has other downsides: you can’t easily use Guix on the created VM
<rekado_>with the shared EFS you can use the whole store and let Guix communicate with a remote daemon to update your environment at runtime.
<rekado_>(e.g. using the Guix kernel for Jupyter)
<rekado_> https://twitter.com/nordholmen/status/1373121559548850176 — “depends on how you create the container” is such a common response, it almost deserves a blog post
<rekado_>it reminds me of Greenspun’s tenth rule.
<rekado_>“sure, if you want reproducibility you need to cache all repositories that your Dockerfile accesses; and you can’t use an arbitrary base image that you don’t control; and ….”
<rekado_>these are things that very few people are even aware of, so the overwhelming majority of containers fail to meet even basic requirements for reproducibility
<rekado_>the water <–> ice analogy is wrong too.
<rekado_>yes, you can’t swim in ice
<rekado_>but you can thaw it
<rekado_>you can’t thaw a container blob.
<PurpleSym>“Impatient” meaning more than 2–3 seconds in my experience, btw.
<rekado_>it takes definitely longer than that
<rekado_>provisioning the VM takes more than 3 seconds already
<PurpleSym>I think O2R/ERC are more concerned about post-research publication, which is why irreversibly freezing the results is probably fine for them.
<PurpleSym>(aka “see, you can run it and get the same results”-reproducibility)
<rekado_>then it has to boot, then it needs to do all the boring stuff that turn the generic AMI into something suitable for working with (e.g. “yum update -y” “yum install -y amazon-efs-utils”, mount any shared EFS for common data, mount block storage, etc)
<rekado_>and only *then* can we copy data around
<rekado_>you could make things a little faster by preparing an EBS volume with the relevant /gnu/store subset while the EC2 instance boots, and then attach it once the EC2 instance is ready.
<rekado_>but the mere act of copying data around is slow
<rekado_>(copying stuff also depends on features that Guix currently doesn’t expose: we need something between “guix copy” and “guix pack”, a file iterator of sorts, so that we can stream files to a target without intermediate packing)
<PurpleSym>I.e. `guix pack` streaming a tar archive to stdout.
<rekado_>yes, that would work
<rekado_>for iterating on the same environment (e.g. running the thing again with an added package) you could get speed-up by listing just the files and piping the names to rsync to take care of the difference