IRC channel logs

<amz31>héllo #guile!

***amz31 is now known as amz3`

<amz3`>davexunit: hi! is is from SICP that you learned from the functional/monadic interface?

<amz3`>sneek: sicp?

<sneek>From what I understand, sicp is Structure and Interpretation of Computer Programs, http://mitpress.mit.edu/sicp/

<quigonjinn>Is there an alternative to (%site-ccache-dir), where you can install a compiled .go file as a non-root user, so that the guile interpreter has access to the provided module, without passing any arguments or calling load-compiled?

<stis>heya guilers!

<catonano>stis: ehya ;-)

<stis>:)

<amz3`>seems like the double keyboard interupt bug is gone in 2.1.3

<amz3`>I'm bored today

<amz3`>so much things to do, not enough energy.

<amz3`>I read a lot of whoosh documentation and had a look at lucene code

<stis>what was your finding?

<amz3`>not much

<amz3`>I have a better picture of the required feature for a search engine

<stis>ok

<amz3`>I discovered that wiredtiger does somekind of prefix compression which allows to insert rows that looks like ("term1", doc-1, position-1) ("term1", doc-2, position-2) efficiently

<amz3`>which means I don't have to pack terms occurrence in a single row

<amz3`>lucene for instance, does not use an ACID database, the code is cluttered with exception handling

<amz3`>"if ... happens, it will corrupt the index" all the code looks like that

<stis>ouch

<amz3`>I still don't know how they whoosh/lucene does to interpret the query without going through the whole database

<amz3`>given a boolean query like (and "keyword1" (or "keyword2" "keyword3")) for instance

<amz3`>going through the whole database and apply the query to each document is very easy

<stis>hmm some kind of indexing, is ther a good wikipedia page for this. This might give you some cluies

<stis>Is the keyword matching ala regular expressions?

<stis>or whole words? prefices?

<stis>err prefixes?

<amz3`>I understand, let me think

<amz3`>given I don't know regex algorithm, I can't say, but it's clue where to look

<amz3`>keyword matching? I don't know that regex feature

<stis>well are you matching whole words?

<amz3`>bascally (and "abc" "def") means retrieve documents where both "abc" and "def" appears

<amz3`>yes, why?

<amz3`>doing regex search is another story I guess, but a similar one based on what I am reading

<stis>You can make a tree of words where each words points to a set of documents.

<stis>Then you can take intersections and unions of the sets

<amz3`>yes

<stis>and set difference

<amz3`>it's not performant

<amz3`>I think of something

<amz3`>now I recall an algorithm for computing the intersection of sorted sets

<amz3`>the problem is that my index is not sorted

<amz3`>sorry, it is

<stis>How many elements can a set be?

<stis>if it is less then 10.000 tou can represent all elements as bit wectors. Then you do a union or intersection in a microsecond or so

<amz3`>well, there is less documents matching a given keyword than there is documents. So (length set) < (length documents)

<amz3`>oh really

<stis>true

<amz3`>for a start, I go the route you describe without using the bitvector optimisation, since I really don't know how much document there will be in the db

<stis>you can use haslists as well as sets. There the intersection and union are quite efficient

<stis>for really large sets you end upp with data in secondary storage and then you need something else

<stis>maybe https://en.wikipedia.org/wiki/Database_index

<amz3`>I am a designing the database index right now

<stis>database indexes should be a library, is it so hard to find a good one for your need?

<amz3`>I use wiredtiger, it's a database engine

<amz3`>here is an explanation describing how to create the correct table in wiredtiger https://groups.google.com/forum/#!topic/wiredtiger-users/LfziOpPIWZU

<stis> https://en.wikipedia.org/wiki/Search_engine_indexing

<amz3`>thx

<amz3`>ACTION is a wikipedia hell loop

<amz3`>;)

<amz3`>thx again stis

<stis>:)

<amz3`>«one of the problems with ranked lists is that they might not reveal relations that exist among some of the result items.»

<amz3`>sneek: help

<amz3`>sneek: later tell spk121 the download link for guile-curl doesn't work, it's a 403 http://www.lonelycactus.com/tarball/guilecurl-0.3.tar.gz

<sneek>Got it.

<amz3`>otherwise you can download via github, but the gnulib submodule is massive :

<amz3`>:/

IRC channel logs

2016-08-14.log