***amz31 is now known as amz3`
<amz3`>davexunit: hi! is is from SICP that you learned from the functional/monadic interface? <quigonjinn>Is there an alternative to (%site-ccache-dir), where you can install a compiled .go file as a non-root user, so that the guile interpreter has access to the provided module, without passing any arguments or calling load-compiled? <amz3`>seems like the double keyboard interupt bug is gone in 2.1.3 <amz3`>so much things to do, not enough energy. <amz3`>I read a lot of whoosh documentation and had a look at lucene code <amz3`>I have a better picture of the required feature for a search engine <amz3`>I discovered that wiredtiger does somekind of prefix compression which allows to insert rows that looks like ("term1", doc-1, position-1) ("term1", doc-2, position-2) efficiently <amz3`>which means I don't have to pack terms occurrence in a single row <amz3`>lucene for instance, does not use an ACID database, the code is cluttered with exception handling <amz3`>"if ... happens, it will corrupt the index" all the code looks like that <amz3`>I still don't know how they whoosh/lucene does to interpret the query without going through the whole database <amz3`>given a boolean query like (and "keyword1" (or "keyword2" "keyword3")) for instance <amz3`>going through the whole database and apply the query to each document is very easy <stis>hmm some kind of indexing, is ther a good wikipedia page for this. This might give you some cluies <stis>Is the keyword matching ala regular expressions? <stis>or whole words? prefices? <amz3`>given I don't know regex algorithm, I can't say, but it's clue where to look <amz3`>keyword matching? I don't know that regex feature <stis>well are you matching whole words? <amz3`>bascally (and "abc" "def") means retrieve documents where both "abc" and "def" appears <amz3`>doing regex search is another story I guess, but a similar one based on what I am reading <stis>You can make a tree of words where each words points to a set of documents. <stis>Then you can take intersections and unions of the sets <amz3`>now I recall an algorithm for computing the intersection of sorted sets <amz3`>the problem is that my index is not sorted <stis>How many elements can a set be? <stis>if it is less then 10.000 tou can represent all elements as bit wectors. Then you do a union or intersection in a microsecond or so <amz3`>well, there is less documents matching a given keyword than there is documents. So (length set) < (length documents) <amz3`>for a start, I go the route you describe without using the bitvector optimisation, since I really don't know how much document there will be in the db <stis>you can use haslists as well as sets. There the intersection and union are quite efficient <stis>for really large sets you end upp with data in secondary storage and then you need something else <amz3`>I am a designing the database index right now <stis>database indexes should be a library, is it so hard to find a good one for your need? <amz3`>I use wiredtiger, it's a database engine <amz3`>ACTION is a wikipedia hell loop <amz3`>«one of the problems with ranked lists is that they might not reveal relations that exist among some of the result items.» <amz3`>otherwise you can download via github, but the gnulib submodule is massive :