IRC channel logs

2015-06-17.log

back to list of logs

***tschwing_ is now known as tschwinge
<rekado_>the guix-daemon is so slow when creating a new profile symlink tree over NFS.
<rekado_>I'm considering to just move /gnu back to the local machine and export that as a read-only filesystem over NFS, so that at least the write operations are fast again.
<rekado_>this is probably the last big obstacle to letting users manage their profiles regularly via the management node.
<rekado_>it's just too slow.
<rekado_>e.g. 10+ minutes to install vim into a profile.
<rekado_>the daemon is issuing a lot of lstat, chmod, link, readlink, rename calls when building a new profile version; over NFS this seems to take forever with the number of files that are involved.
<civodul>rekado_: 10 minutes!
<civodul>this is weird
<civodul>rekado_: the lstat, chmod, readlink are all done when building the derivation itself
<civodul>can't you have guix-daemon access the store as a local file system
<civodul>and then have the other machines mount the store over NFS?
<rekado_>the management interface of the NFS server says that we get at most 400 operations per second, and around 200 ops/s when building the new profile version.
<civodul>where one operation = one syscall, basically, right?
<rekado_>yes, moving /gnu back to the local disks is what I'm going to do next.
<rekado_>yes
<civodul>yeah i think the daemon definitely needs fast access
<civodul>build processes aren't optimized
<rekado_>I wanted to use our central storage for all guix stuff, but it's just too slow.
<rekado_>"build processes" --- I'm only using substitutes.
<rekado_>it's link traversal and renaming of links that eats up most of the time.
<rekado_>BTW: this is ZFS over NFS.
<civodul>yes, profiles are derivations as well, there's a build process
<civodul>ZFS over NFS?
<civodul>is that possible? :-)
<rekado_>yeah, it's a solaris box.
<civodul>incredible
<civodul>is it a SPARC?
<rekado_>I don't know what I'm talking about, though. I'm just parroting whatever my sysadmin colleague says :)
<civodul>:-)
<mark_weaver>rekado_: profile generation used to be a lot slower, and at some point I optimized it as well as I could without radical changes.
<mark_weaver>the first idea for a radical change that comes to mind is to store in the sqlite database the complete list of files and directories in each store item.
<mark_weaver>so that we could avoid all the lstat calls.
<davexunit>eventually we'll have to start making some of these radical changes I suppose.
<davexunit>we've already begun to diverge from nix in other ways.
<mark_weaver>the deduplication machinery as currently implemented also costs a lot of lstat calls.
<mark_weaver>in theory it would be possible to move that into the sqlite database as well.
<mark_weaver>unfortunately, with NFS, you are paying a lot of overhead in order to handle the worst case of a mutable filesystem, when in fact /gnu/store is mostly immutable and could be implemented much more efficiently.
<rekado_>true.
<rekado_>I'm annoyed by rpm, yum, and Fedora. I really want to install GuixSD on my workstation in the office.
<rekado_>I'll need to check what software still needs to be packaged to be able to join the domain and do LDAP authentication.
<mark_weaver>oooh, we have wayland now!
<mark_weaver>thanks iyzsong :)
<mark_weaver>I'm currently working on cogl, on the road to clutter and totem.
<davexunit>oh wow
<davexunit>exciting
<mark_weaver>I have a package that builds when tests are disabled, but _all_ the tests fail with an error that probably means the library is currently not functional: "Failed to create a CoglContext: The OpenGL version could not be determined"
<mark_weaver>so, some debugging will be needed.
<davexunit>mark_weaver: does it need xvfb or something?
<davexunit>probably needed in order to get an opengl context
<mark_weaver>hmm, dunno! I'm quite ignorant of GL stuff.
<mark_weaver>if xvfb were needed, I guess I'd expect it to fail at compile time instead of runtime, no?
<davexunit>not needed to compile, but needed for the test suite
<mark_weaver>I guess I should post the recipe I currently have in case someone else more knowledgable of this stuff wants to poke at it.
<davexunit>there's other packages that have this requirement, such as guile-sdl
<mark_weaver>okay
<mark_weaver>oh, I see what you mean.
<mark_weaver>right
*davexunit is reading https://docs.docker.com/userguide/dockerizing/
<davexunit>docker's CLI is quite good
<davexunit>for the most part. need to do something similar for guix.
<mark_weaver>davexunit: alas, adding xorg-server to native-inputs didn't change anything.
<davexunit>mark_weaver: and you also ran Xvfb before running the test suite?
<mark_weaver>ah, no I didn't!
<davexunit>no promises, but that might do the trick *if* X is really the only issue here
<mark_weaver>yeah, good idea, thanks!
*mark_weaver copies from guile-sdl package
<davexunit>:)
<mark_weaver>well, it's failing in a more interesting way now :)
<mark_weaver>lots of errors like this are being printed in between the failures: "_FontTransOpen: Unable to Parse address ${prefix}/share/fonts/X11/misc/"
<mark_weaver>I guess I need to pass better xorg.conf or something
<mark_weaver>well, I have to put this down for a while and go afk.
<mark_weaver>davexunit: thanks for the tips!
<mark_weaver>*clues
<davexunit>mark_weaver: I'm pretty sure I got a lot of those warnings with guile-sdl's test suite but they were bening
<davexunit>benign*
<davexunit>looks very familiar
<davexunit>progress, anyway. see ya later.
***tschwing_ is now known as tschwinge
<davexunit>civodul: do you think that we ought to mount /sys/fs/cgroup by default? just noticing now that this isn't the case.
<rekado_>so, I want to regularly rsync a local /gnu directory to /gnu_remote (the slow NFS share). As long as these two directories are the same with a couple of minutes difference that's a usuable workaround.
<rekado_>I just don't know what rsync flags to pass.
<rekado_>"rsync -aHr --delete --delete-before"; not sure about whether to add -k/-K for dir symlinks.
<rekado_>this works but it still takes minutes. I wonder if I could combine inotify with rsync to speed things up.
<civodul>davexunit: yes, probably
<civodul>rekado_: i wonder if NFS has worthy optimizations when it's mounted read-only
<davexunit>civodul: cool, I'll come up with a patch when I have the chance.
<mark_wea`>civodul: I've noticed that very often, hydra offloads all of its mips jobs to hydra-slave0, leaving librenote with nothing to do. very strange.
<mark_wea`>for example, look at the log at http://hydra.gnu.org:3000/build/507460
<mark_wea`>it samples of loads of both mips slaves:
<mark_wea`>load on machine 'hydra-slave0.gnu.org' is 3.86 (normalized: 1.93)
<mark_wea`>load on machine 'librenote.netris.org' is 0.01 (normalized: 0.005)
<mark_wea`>and then it decides to use hydra-slave0. why?
<civodul>mark_wea`: could be broken logic in 'guix offload'
<mark_wea`>so now we have hydra-slave0 compiling both gcc-4.9 and 5.1, and librenote is just sitting idle.
<civodul>bah
<civodul>sounds silly
<mark_wea`>indeed :-/
<civodul>'guix offload' must be sorting in reverse order or something
<civodul>mark_wea`: i'm happy to have another eyeball on the choose-build-machine procedure, if you have time
<mark_wea`>civodul: sure, I'll take a look
***jchmrt_ is now known as jchmrt
<efraim>rekado_: you could mount the NFS mount async
<mark_wea`>civodul: 'sort' might not cope well with the fact that the 'machine-power-factor' may return different values for a given machine during the sorting.
<mark_wea`>well, I guess it shouldn't matter in this case
<mark_wea`>but it would be good to avoid sampling a machine's load more than once during the sorting.
<mark_wea`>the log output of http://hydra.gnu.org/build/507460 does indeed indicate that hydra-slave0 was 'best' in 'choose-build-machine'.
<mark_wea`>hmm.
<mark_wea`>well, I'll have to look at this more closely.
<mark_wea`>hmm, I lost my connection to freenode about an hour ago, but mark_weaver is still here.
<mark_wea`>civodul: I see the problem.
<mark_wea`>the procedure returned by 'undecorate' should return a boolean, not an element.
<mark_wea`>so it should just return (pred machine1 machine2)
<mark_wea`>but also, this should be improved to only sample the load of each machine just once.
<mark_wea`>I'll work on it.
<civodul>mark_wea`: re undecorate, indeed, good catch!
<civodul>silly me
<civodul>thanks for looking into it!
<mark_wea`>np!
<mark_wea`>civodul: I posted both patches to guix-devel
<mark_wea`>(one to fix the bug, another to memoize 'machine-load')
<mark_wea`>I'm a bit concerned about the fact that freenode thinks mark_weaver is still connected, but on my end I lost that connection over 1.5 hours ago
<mark_wea`>I wonder if there's someone I can talk to about this.
<mark_wea`>this has never happened to me before.
<mark_wea`>makes me wonder if someone hijacked my connection
<davexunit>mark_wea`: you can "ghost" that connection
<davexunit>to kick it off
<davexunit>and log back in
<mark_wea`>davexunit: how do it do it?
<davexunit>I think you do something like: /msg NickServ ghost <username> <password>
<mark_wea`>okay, thanks
<civodul>mark_wea`: just replied regarding offload, thanks
<civodul>terrible bug
<mark_weaver>davexunit: hmm, it told me that mark_weaver was not online.
<mark_weaver>even though we never got notification mark_weaver disconnecting, and so my client thought mark_weaver was still online.
<mark_weaver>anyway, it's all good now, thanks :)
<davexunit>cool :)
<davexunit>yw
<davexunit>there's been strange netsplits today
<davexunit>I had something similar happen earlier
<mark_weaver>civodul: okay, I pushed the sort fix. how best to deploy it on hydra? should I "make install" from a newly built guix from git, or just copy offload.{scm,go} into place?
<mark_weaver>or would you like to do it?
<mark_weaver>my guess of the appropriate configure flags is: --localstatedir=/nix/var --disable-daemon
*civodul checks
<civodul>mark_weaver: right, i used --localstatedir=/nix/var --with-store-dir=/gnu/store
<civodul>you could update the daemon while you're at it
<civodul>it has some GC optimizations
<mark_weaver>oh, right, so I should *not* pass --disable-daemon. and I don't need --with-store-dir either now.
<mark_weaver>okay, I'll work on it.
<mark_weaver>civodul: given that I already have a very recent guix built on hydra with --localstatedir=/nix/var --disable-daemon, do you think it's safe to just ./configure --localstatedir=/nix/var && make, or should I make clean?
<mark_weaver>I made clean from a few commits ago with --disable-daemon
<mark_weaver>bah: configure: error: C++ compiler 'g++' does not support the C++11 standard
<mark_weaver>I guess we'll have to build this within a pure guix environment on hydra.
<mark_weaver>or update the guix snapshot and use that directly.
<mark_weaver>our 'guix-devel' package, I mean.
<mark_weaver>wdyt?
<mark_weaver>civodul: ^^
<civodul>aaah, indeed
<civodul>hmm
<mark_weaver>in the meantime, is it safe for me to 'make install' with --disable-daemon? will it leave the existing installed deamon alone and everything will continue to work?
<civodul>yes, sounds good
<mark_weaver>okay, will do
<civodul>i was wondering about the consequences if we built in an environment
<civodul>GC and all that
<mark_weaver>our new binary install method may become more important now :-/
<mark_weaver>yeah, I don't know either.
<mark_weaver>oh, if we use the existing guix package in guix, then it will have the wrong localstatedir
<mark_weaver>could that be fixed with just a symlink?
<civodul>we have GCC 4.4 there
<civodul>mark_weaver: yes, that would work
<mark_weaver>would there be consequences to having the guile modules in a different place?
<mark_weaver>how many places assume /usr/local?
<mark_weaver>maybe it would be sufficient to make symlinks for guix and guix-daemon from /usr/local/bin?
<mark_weaver>or would there be other issues?
<civodul>guix-register also needs to be visible
<civodul>so basically, we could (1) install Guix in a profile, (2) symlink guix and guix-{daemon,register} in /usr/local, (3) symlink /var/guix
<mark_weaver>civodul: okay, sounds good
<mark_weaver>first step: update guix-devel
<mark_weaver>I'll work on it
<civodul>great
<zacts>salvete
<zacts>salut
<zacts>aloha
<zacts>:-)
<mark_weaver>civodul: if I run "./pre-inst-end guix build guix" from my git checkout on hydra, will hydra still show the build logs for it when master is later evaluated?
<civodul>mark_weaver: i think it won't show it, but i can't remember why
<mark_weaver>okay, I'll wait for hydra to do it then.
***y is now known as init
<davexunit>civodul: it seems that on other distros it's systemd that creates the cgroup hierarchy. do you think it's still okay for us to do during boot time?
<davexunit>I was thinking immediately after mounting /sys in mount-essential-file-systems
<davexunit>cgroups aren't "essential", so I'll write a mount-cgroups procedure that's run after mount-essential-file-systems