IRC channel logs
2021-03-20.log
back to list of logs
<rekado_>yes, a lot of the tools they (and other cloud services) offer are orbiting the idea of container blobs <rekado_>they offer a container registry where you can “docker push” your blobs and then fetch them from there when a machine is initialized. <rekado_>kubernetes is the same: you define services that are backed by a container blob <rekado_>shared file systems are difficult to implement well, so copying blobs is the preferred model. <rekado_>I might replace the current setup (shared EFS, just like our traditional cluster setup) with a “guix pack”-based thing to copy the requested environment to new VMs <rekado_>it will make VM initialization slower, but this might be an acceptable compromise to make Guix fast. <PurpleSym>rekado_: Do you have any insight into how long initialization takes in the copy scenario? I found that users are very impatient when it comes to starting applications. <rekado_>PurpleSym: it takes too long if people are impatient <rekado_>you can always throw more money at this to transfer data more quickly, but you can’t downgrade to slower networking dynamically <rekado_>this also has other downsides: you can’t easily use Guix on the created VM <rekado_>with the shared EFS you can use the whole store and let Guix communicate with a remote daemon to update your environment at runtime. <rekado_>(e.g. using the Guix kernel for Jupyter) <rekado_>it reminds me of Greenspun’s tenth rule. <rekado_>“sure, if you want reproducibility you need to cache all repositories that your Dockerfile accesses; and you can’t use an arbitrary base image that you don’t control; and ….” <rekado_>these are things that very few people are even aware of, so the overwhelming majority of containers fail to meet even basic requirements for reproducibility <rekado_>the water <–> ice analogy is wrong too. <PurpleSym>“Impatient” meaning more than 2–3 seconds in my experience, btw. <rekado_>it takes definitely longer than that <rekado_>provisioning the VM takes more than 3 seconds already <PurpleSym>I think O2R/ERC are more concerned about post-research publication, which is why irreversibly freezing the results is probably fine for them. <PurpleSym>(aka “see, you can run it and get the same results”-reproducibility) <rekado_>then it has to boot, then it needs to do all the boring stuff that turn the generic AMI into something suitable for working with (e.g. “yum update -y” “yum install -y amazon-efs-utils”, mount any shared EFS for common data, mount block storage, etc) <rekado_>and only *then* can we copy data around <rekado_>you could make things a little faster by preparing an EBS volume with the relevant /gnu/store subset while the EC2 instance boots, and then attach it once the EC2 instance is ready. <rekado_>but the mere act of copying data around is slow <rekado_>(copying stuff also depends on features that Guix currently doesn’t expose: we need something between “guix copy” and “guix pack”, a file iterator of sorts, so that we can stream files to a target without intermediate packing) <PurpleSym>I.e. `guix pack` streaming a tar archive to stdout. <rekado_>for iterating on the same environment (e.g. running the thing again with an added package) you could get speed-up by listing just the files and piping the names to rsync to take care of the difference