<rekado_>no, it’s a high-availibility cluster service
<rekado_>you specify nodes as part of a high-availability cluster, define resources, add constraints, and then the system aims to provide the defined resources under the given constraints.
<rekado_>in our case we have two nodes; guix-daemon is a managed resource constrained to only run on the first node; the NFS server is another managed resource that runs preferentially on the first node, but will run on the second node when the first one dies.
<rekado_>the IP address is also a shared resource.
<rekado_>since there are resource constraints (e.g. guix-daemon may only run after we have mounted the cluster file system, and it may never run on the second node)
<rekado_>… certain services may only be started by pacemaker, not by systemd directly
<rekado_>so you have a *disabled* guix-daemon.service to ensure it won’t just get started when the system boots, so that pacemaker can start it (via systemd) ensuring that the constraints are not violated.
<rekado_>the whole point of all this is that /gnu/ and /var/guix/profiles/ should always be available on all client; in the worst case it’ll be in read-only mode.
<rekado_>so we can take the first node down, upgrade it, change it, and the worst impact on users is that they can’t install new software (because they can’t talk to the daemon). Their cluster jobs that depend on /gnu to remain available will not be impacted.
<rekado_>it’s not terribly complicated (because we don’t have that many resources that need managing), but:
<rekado_>a) all these resource definitions are done outside of the configuration management system (puppet) due to the version of pacemaker that we have to use on RHEL
<rekado_>and b) all the rest *is* done in puppet, which is stateful and so makes it really hard to anticipate how it’s going to fail
<rekado_>I’d *love* to do this declaratively with Guix System.