IRC channel logs


back to list of logs

<amz31>héllo #guile!
***amz31 is now known as amz3`
<amz3`>davexunit: hi! is is from SICP that you learned from the functional/monadic interface?
<amz3`>sneek: sicp?
<sneek>From what I understand, sicp is Structure and Interpretation of Computer Programs,
<quigonjinn>Is there an alternative to (%site-ccache-dir), where you can install a compiled .go file as a non-root user, so that the guile interpreter has access to the provided module, without passing any arguments or calling load-compiled?
<stis>heya guilers!
<catonano>stis: ehya ;-)
<amz3`>seems like the double keyboard interupt bug is gone in 2.1.3
<amz3`>I'm bored today
<amz3`>so much things to do, not enough energy.
<amz3`>I read a lot of whoosh documentation and had a look at lucene code
<stis>what was your finding?
<amz3`>not much
<amz3`>I have a better picture of the required feature for a search engine
<amz3`>I discovered that wiredtiger does somekind of prefix compression which allows to insert rows that looks like ("term1", doc-1, position-1) ("term1", doc-2, position-2) efficiently
<amz3`>which means I don't have to pack terms occurrence in a single row
<amz3`>lucene for instance, does not use an ACID database, the code is cluttered with exception handling
<amz3`>"if ... happens, it will corrupt the index" all the code looks like that
<amz3`>I still don't know how they whoosh/lucene does to interpret the query without going through the whole database
<amz3`>given a boolean query like (and "keyword1" (or "keyword2" "keyword3")) for instance
<amz3`>going through the whole database and apply the query to each document is very easy
<stis>hmm some kind of indexing, is ther a good wikipedia page for this. This might give you some cluies
<stis>Is the keyword matching ala regular expressions?
<stis>or whole words? prefices?
<stis>err prefixes?
<amz3`>I understand, let me think
<amz3`>given I don't know regex algorithm, I can't say, but it's clue where to look
<amz3`>keyword matching? I don't know that regex feature
<stis>well are you matching whole words?
<amz3`>bascally (and "abc" "def") means retrieve documents where both "abc" and "def" appears
<amz3`>yes, why?
<amz3`>doing regex search is another story I guess, but a similar one based on what I am reading
<stis>You can make a tree of words where each words points to a set of documents.
<stis>Then you can take intersections and unions of the sets
<stis>and set difference
<amz3`>it's not performant
<amz3`>I think of something
<amz3`>now I recall an algorithm for computing the intersection of sorted sets
<amz3`>the problem is that my index is not sorted
<amz3`>sorry, it is
<stis>How many elements can a set be?
<stis>if it is less then 10.000 tou can represent all elements as bit wectors. Then you do a union or intersection in a microsecond or so
<amz3`>well, there is less documents matching a given keyword than there is documents. So (length set) < (length documents)
<amz3`>oh really
<amz3`>for a start, I go the route you describe without using the bitvector optimisation, since I really don't know how much document there will be in the db
<stis>you can use haslists as well as sets. There the intersection and union are quite efficient
<stis>for really large sets you end upp with data in secondary storage and then you need something else
<amz3`>I am a designing the database index right now
<stis>database indexes should be a library, is it so hard to find a good one for your need?
<amz3`>I use wiredtiger, it's a database engine
<amz3`>here is an explanation describing how to create the correct table in wiredtiger!topic/wiredtiger-users/LfziOpPIWZU
<amz3`>ACTION is a wikipedia hell loop
<amz3`>thx again stis
<amz3`>«one of the problems with ranked lists is that they might not reveal relations that exist among some of the result items.»
<amz3`>sneek: help
<amz3`>sneek: later tell spk121 the download link for guile-curl doesn't work, it's a 403
<sneek>Got it.
<amz3`>otherwise you can download via github, but the gnulib submodule is massive :