IRC channel logs

2014-03-29.log

back to list of logs

<phant0mas>the number of patches used in the debian package of glibc is...unbelievable
<zacts>phant0mas: that's interesting, so is it basically a fork of glibc?
<zacts>do they not like using an upstream glibc for some reason?
*zacts thought they used eglibc by default.
<phant0mas>from some search I did they hurd guys actually have pushed some patches to debian instead of upstream
<civodul>good night/day!
<phant0mas>they use the same eglibc
<phant0mas>but sometimes (because it's easier?) they prefer to work on the debian source and not the upstream
***phantomas is now known as Guest70694
***Guest70694 is now known as ph4n70m4s
<zacts>lo
<ph4n70m4s>good morning guix
<civodul>Hello Guix!
<davexunit>hello civodul
<davexunit>civodul: I was thinking that adding levenshtein edit distance to improve `guix package -A foo` output would be a fun hack for this weekend.
<mark_weaver>davexunit: shortly after you asked about this before, I implemented a procedure that computes the edit distance and also shows the associated list of minimal edits. however, for what you're talking about it to be reasonably efficient, we'd need something more elaborate than what I wrote. I found links on wikipedia for such things, but now I've forgotten them. levenshtein automata maybe? or something like that?
<mark_weaver>anyway, it definitely sounds like a fun hack.
<davexunit>mark_weaver: oh really?
<davexunit>I thought it would be easier than that.
<mark_weaver>the issue is that you'd have to use 'levenshtein-edit-distance' to compare what the user typed to every known package in the system, which is a large number.
<davexunit>yeah
<mark_weaver>but there are efficient methods for dealing with that.
<mark_weaver>I think this would be a good tool for the job: http://en.wikipedia.org/wiki/Levenshtein_automaton
<davexunit>I'll take a look
<davexunit>but I might punt on it.
<mark_weaver>heh :)
<davexunit>I didn't expect it to be so difficult.
<davexunit>this is made even more difficult given that the input is a regexp.
<davexunit>I implemented the naive version using the dynamic programming approach via the wagner-fischer algorithm.
<davexunit>now to learn about this automaton.
<civodul>davexunit: i'm all for this hack!
<civodul>well, whichever gets to it first :-)
<davexunit>well, if mark_weaver does more hacking on it then I will yield to him.
<civodul>let's make a contest! ;-)
<mark_weaver>heh, I have too much to do already, and I probably won't get around to it for a long while.
<mark_weaver>davexunit: if you're interested in doing it, I think you should! it would be a fun and useful hack :)
<davexunit>mark_weaver: okay. :)
<davexunit>How should we display this information to the user? I was thinking that if there are no packages that match the given pattern, we display a message like "No packages match 'whatever'. Did you mean one of these?" and then give a list of packages that have an edit distance of 2 or so.
<mark_weaver>davexunit: one thing: transpositions are one of the most common errors that people make, so if the maximum edit distance is small, make sure to take that into account. there are a couple of variants of levenshtein that consider transpositions to be a single edit.
<mark_weaver>as for how to display it, that's a good question! let's get the hard part implemented first, and we can worry about those details later.
<davexunit>mark_weaver: okay, I'll keep transpositions in mind.
<davexunit>I'm still hunting for information about the automaton
<mark_weaver>other possible algorithms, possibly better, are from Ukkonen in 1985, and the "bitap" algorithm.
<mark_weaver>both are mentioned in the "Applications" section of http://en.wikipedia.org/wiki/Edit_distance
<mark_weaver>The References section on that page looks useful
<mark_weaver>if some of the papers you'd like to read are not available gratis on the internet, you could use the MIT libraries, which are open to the public.
<mark_weaver>(unlike the Harvard ones)
<mark_weaver>Ukkonen has done some great work in this area.
<mark_weaver>here's the Ukkonen 1985 paper: http://www.cs.helsinki.fi/u/ukkonen/JAlg.pdf
<davexunit>mark_weaver: thanks
<mark_weaver>this more recent paper might also be useful: http://www.cis.uni-muenchen.de/projects/Irgroup/levaut6.ps.gz
<davexunit>I've been reading that one.
<davexunit>it's a lot to work through.
<mark_weaver>if you're motivated, and interested in this general topic, it would be a great learning experience. if not, that's fine too :)
<davexunit>I'm interested. I might shift my attention towards bitap for the time being.
<mark_weaver>sounds good. I have relatively little knowledge in this area.
<mark_weaver>it sounds like some work as already been done to deal with regexps, so it might be a better fit anyway.
<phant0mas>mark_weaver: when I try to copy a file in another folder with '(copy-file "libpthread/include/pthread/pthread.h" "bits")'
<phant0mas>I get ERROR: In procedure copy-file:
<phant0mas>ERROR: In procedure copy-file: Is a directory
<phant0mas>any ideas?
<mark_weaver> http://laurikari.net/tre/about/
<mark_weaver>davexunit: ^^
<mark_weaver>phant0mas: I suspect the destination has to be the new name. the 'cp' program is clever and figures out what you want based on whether the destination is an existing directory, but 'copy-file' is lower-level.
<phant0mas>bits is a folder which I want to copy the file in
<mark_weaver>I think the destination should be "bits/pthread.h"
<phant0mas>aaa
*mark_weaver goes offline for a bit...
<phant0mas>so that's how copy-file works
<phant0mas>ok
<phant0mas> will *.diff files work as well or I can only use *.patches?
<mark_weaver>I don't see why *.diff wouldn't work, but for consistency I think you should rename to *.patch.
<phant0mas>ok
***sea` is now known as sea