IRC channel logs


back to list of logs

<daviid>heya guilers!
<daviid>I am facing problem in g-golf, that although I can see what causes the exception to be raised, can't find a rational explamation to why g-golf 'reaches that weird state', nor a solution to the problem ...
<daviid>so I am going to 'speak it loudly', lets see if it helps :), or if someone here can point to a solution ...
<mwette>go for it
<daviid>all this is related to the Glib/GObject GSource and GClosure, provided so one can define a procedure, then ask Glib/GObject to execute it 'on demand', such as a callback, just to name ean example
<daviid>in g-golf (as any 'advnaced ' language binding to glib/gobject and/or gobject introspection, the g-idle-add and g-idle-timeout* procedures are implemetd 'manually', becaue they manipulate a 'high level' object (a goops instance in the case of guile) and can't 'just' call the provided g_idle_add, g_timeout_add*( functions ... these steps are very well defined, recommended in the glib manual
<daviid>so, let's focus on g-idle-add, the code is very short and here - line 6- - 69
<daviid>as it has to, it needs a (pointer to a) GSource instance, called source in the g-gold code, and a GClosure instance, called closure (actually a goops instance, i'll come to it in a second
<daviid>the source is needed because it hold' on the gclosure and is attached to a main context glib main-loop
<daviid>th gclosure is the part that provides the interface between gobject and the languae, it will execute a scheme rocedure on demand, and it does so by the use of what is called a marshaller, in g-golf the closure/ŋclosure infrastructure is define here
<daviid>now, here is how it works: (a) a user calls g-idle-add, passing a lambda that accept no argument and returns a boolean, here is a simple example
<daviid>(g-idle-add (lambda () (display ".") #f))
<daviid>the lambda is used to define a <closure> instance
<daviid>and all tha matters for those who follow, is the (specialized) initialize instance method, defined in line 73 - 89
<daviid>and what it does instantiate a gclosure (gobject) instance, tells it what is the marshaller is should use (the marshaller will execute the procedure), sink and ref the gclosure so its count ref is 1
<daviid>it also permit to associate a g-closure-add-invalidate-notifier, which will be called when the gclosure count (un)ref reaches zero, which means glib will recover its memory
<daviid>it also does a very important thing, which is to hold on a reference to the scheme procedure the <closure> instance is being created 'for', and that is line 83 in the closure module: (gi-closure-cache-set! g-closure function)
<daviid>which associated in a hash table, to the address of the pointer to the gclosure, the procedure, see line 287 - 289 i the same module, if interested, but for now, it is enough to 'trust' that the scheme procedure is cached, hence 'protected from gc'
<daviid>one more word about g-idle-add, then i'll aspte the example that raies the exmption :)
<daviid>the lambda passed to g-idle-add returns a boolean, and that boolean deterines if the idle is kept or not in the maincontext of the glib main-loop is is associated with
*RhodiumToad pays careful attention
<daviid`>hum, bad internet connection
***daviid is now known as Guest29292
***daviid` is now known as daviid
<RhodiumToad>we saw "the lambda passed to g-idle-add returns a boolean ... main-loop is is associated with" as the last line from you
<daviid>RhodiumToad: ah, let me check the log to rewrite what misses :), sorry about this
<daviid>only one sentence missed, here it is
<daviid>the standard way of doing things, using g-idle-add, is a 'one shot callback', so as in the example above, the lambda returns #f, which means that once the main-loop has excecuted the gclosure, it destroys (g-source-destroy source) the source
<daviid>the example bove is this one
<daviid>(g-idle-add (lambda () (display ".") #f))
<daviid>here is a repl example, just to make things a bit easier to understand, before the exmple that crashes and see what people think ..
<daviid>this is a manual 'one shot' run of a closure, as a idle and a glib main loop
<RhodiumToad>(I already discovered that you can crash guile-gnome by heavy use of idle cb's)
<RhodiumToad>I assume this is an attempt to do that bit right :-)
<daviid>to show that after the code has been executed, the source has been destroyed, _but_ not unref, neither the gclosure, till their respective ref_count reches zero ...
<daviid>RhodiumToad: yes, let's focus on g-golf
<daviid>all this mens that the user, not g-golf but the user, must free the source and the gclosure 'when approriate', and her comes the difficulty, which can be illustrated by running the following example
<RhodiumToad>hm, that sounds like a really bad idea to me
<daviid>which will finally let me point to the bug
<daviid>line 33, is the g-gidle-add call
<daviid>being done in a thread, repeatidly using a named let, perfect
<daviid>within the loop, the source and the closure instances are 'accumulated' in the to-free variable, and when the loop counter reaches n-cycle, it calls unref-and-fre, which, if and only if the source has been destroyed, unref both the source and the closure till their ref count reaches zero, so glib recovers the memory ...
<daviid>so far so good
<daviid>when a closure is freed, it triggers the g-closure-invalidate-notifier, which removes the (gclosure procedure) entry from the hash table, so the guile gc can recover the procedure memory
<daviid>but in realatity, wht happens is this
<daviid>in the gclosure/procedure hash table, i can see that gclosure pointer addresses are always new, but the procedure is always 'the same', because guile's compiler detects the idle 'core' is always the same, and to prove this, i added (loclly didn't push of course) a few displays _and_ what happens, 'the bug'
<daviid>here is a paste of a runnig session of the example that raises the exception
<daviid>this trace i added has been added in the gclosure marshaller
<daviid>line 226, instaed of (function (gi-closure-cache-ref g-closure))
<mwette>yikes ...
<daviid>i have lcally (function (dimfi g-closure " " (g-closure-ref-count g-closure) " " (gi-closure-cache-ref g-closure)))
<daviid>and that shows 'the impossible': the scheme procedure has been freed before the gclosure has been executed
<daviid>which in principle is impossible
<RhodiumToad>ok. so the first question that comes to mind is this, there are obviously two different threads in play, how sure are you of the thread safety of this?
<daviid>RhodiumToad: the problem is glib does not provide a way to know when a idel has been executed and its source destroyed
<daviid>RhodiumToad: things only happen i one thread
<RhodiumToad>not so?
<daviid>the named let runs in one thread
<daviid>so the idle call and the unref-and-free procedure run in the same thread
<RhodiumToad>the idle call will come from the thread running g-main-loop, surely
<daviid>of course the gclosure will be executed by the glib main-loop in a unknown time, but, afaict, unref-and-free will only run if and only f (g-source-is-destroyed? source) is true
<RhodiumToad>as will the invalidate notifier
<RhodiumToad>both threads are touching the hash table, yes?
<daviid>as said above, unref-and-free will only free those sources and closures for which the source has been destroyed
<daviid>both threads are indeed, but let me add one explanation
<daviid>(a) the scheme thread is always been called 'before', what triggers the invalidate norifier is the fact that the gclosure ref-count reaches zero, which only happens because unref-and-free calls (free closure), line 145 - 148 of the clsoure module
<daviid>that code should never be executed unless the (parent source) has been destroyed
<daviid>then (b), the invalidate notifier, in this case, does nothing because the hash entry has been removed already
<daviid>the problem that raises the bug is that the hash entry has been removed befre the idle has benn executed
<daviid>it's a bit like qauntum mechanic: you can read in the example that the unref-and-free code will only run if the source has been destroyed, which in principle can only happen in the glib main-loop and only if the idle has been executed
<daviid>RhodiumToad: do yo have a cpoy of g-golf by any chance?
<RhodiumToad>not at present, though I can try it out later
<daviid>as you wish, i was curios
<RhodiumToad>still reading the code
<daviid>i'm like facing a quantu mecanic stae
<RhodiumToad>I make heavy use of idle callbacks in another language binding, but not at present using threads which obviously complicates things
<daviid>although nothing of this can happen if the source hasn't been destroyed, it appears it does
<daviid>RhodiumToad: yeah, this is complicated
<daviid>if you had g-golf, i can paste an exmaple of almost the same code that never crashes
<RhodiumToad>aha. this situation is described in the docs
<RhodiumToad>see the part "taking hashtables as an example"
<daviid>ok, let me read
<daviid>RhodiumToad: meanwhile, here is a example that works indefinitely, does not crash nor leaks, but i consider not robust, hence my 'other' attempts that leaded to this dscussion
<daviid>in this example that 'runs ok', it is just a matter of 'luck'
<daviid>just because of the usleep, the idle has alawys beene xecuted before the free procs are called ...
<daviid>RhodiumToad: not sure, as i tried to explain, there is only one thread touching the hash
<daviid>in my case
<daviid>not in the pste of the bug, the seconf value, 4, is the ref count of the closure: thismeans that the invalidate-notifier has _not_ been called
<RhodiumToad>you're missing the point that the hashtable can contain more or less anything
<RhodiumToad>you seem to be thinking of the hashtable as an atomic thing which is either updated or not, which is false
<RhodiumToad>and I don't understand why you think it's accessed only from one thread,
<daviid>ah, yes i thought it was
<RhodiumToad>because the main thread calls the invalidate notifier which calls gi-closure-cache-remove! which modifies the hash table,
<RhodiumToad>and the loop thread also modifies the hash table
<daviid>the invalidate notifer as not been called
<RhodiumToad>it's been called on some other object
<daviid>but if, it is a reasult of the unref, that happens in one thread
<RhodiumToad>you're queueing up thousands of idle callbacks for the main thread, no?
<RhodiumToad>modifying the hash table from one thread while modifying a different value from another thread can screw it up, if I read the docs right
<daviid>or the second thread i should say
<RhodiumToad>clearly, the hash table might be reorganized as a result of one of the inserts, and that could interact poorly with modifications from another thread
<daviid>hum, we are talkng passed each other
<RhodiumToad>have you done much work with threads?
<daviid>have to change place, bb in a min
<RhodiumToad>time to go read the guile sources...
***catonano_ is now known as catonano
<RhodiumToad>anyway, there's a fairly easy way to confirm or refute my theory here.
<daviid>:), i am going to remove the invalidate notifier, give me a sec
<daviid>as an experiment
<RhodiumToad>what I was going to suggest was using a (with-mutex) form around the hashtable accesses in the gi-closure-cache functions
<daviid>RhodiumToad: will try that to, many thanks for your help
<spk121>daviid: Not sure if I've followed the whole point, but, the idle will be called from the glib-main-loop thread, while your guile code is running in the guile main thread, so maybe a mutex is the way to go
<daviid>spk121: the idle is called from the main thread, but the unref-free always from the scheme thread
<daviid>anyway, will try the suggestions
<spk121>oh, ok
<daviid>and i'm certain the invalidate notifier is never called _before_ the unref-free, but will try to prove that in a second
<RhodiumToad>that isn't the point, you're still thinking of the hashtable as being atomic with respect to threads
<RhodiumToad>and it's really not
<daviid>RhodiumToad: within one thread it must be, no?
<RhodiumToad>not while another thread might be modifying it, even to change a completely separate value
<daviid>the other thread doesn not modify it
<daviid>nbut let me prove this
<daviid>maybe :)
<RhodiumToad>what, I'm just imagining this call to gi-closure-cache-remove! in g-closure-invalidate-notifier?
<daviid>exatcly, I have removed the call to the invalidate
<daviid>and it still crashes
<RhodiumToad>can you paste the actual code changes you made
<daviid>here is the 'new definition', which comments the call to g-closure-add-invalidate-notifier
<RhodiumToad>ok, so the invalidate notifier is out of play
<RhodiumToad>but your debug output is from g-closure-marshal?
<daviid>the marshaller runs in the main thread
<RhodiumToad>which again is being called on the main thread
<daviid>right, but no thread could have deleted the source before the mashaler is run
<RhodiumToad>but the other thread could be modifying the hashtable.
<daviid>not until the idle hs been executed
<RhodiumToad>if one thread is inserting into the hashtable while another thread reads from it, there is no guarantee that it will correctly find even an existing value
<daviid>ah ok, i issed that
<daviid>my bad
<RhodiumToad>the idle for object 1 could be running while the main thread is inserting object 2, no?
<RhodiumToad>and if object 2 causes the hash table to be resized/rehashed, then the main thread might search for object 1 and not find it
<daviid>i thought from one thread, reading the entry on a gclosure address woud never fail
<daviid>even if removing another entry, but these ops are in the same thread as well ... but that 'a removing' may affect a retreival puzzles me
<RhodiumToad>it's probably the insertion that affects the retrieval and not the removal
<daviid>probbly, because if i wait 'enogh time', it works
<RhodiumToad>if I read this hashtable code right, then on a rehash, it swaps in the empty "new" vector before populating it,
<RhodiumToad>so the likely failure mode of a concurrent lookup is to find nothing
<daviid>so, i should 'rehabilitate the invalidate-notifier and protect with a mutex, line 11
<daviid>RhodiumToad: that explains everything, mny thanks
<RhodiumToad>that's the thing to test to see if this is a correct analysis of the problem
<daviid>i wonder how performance will be affected
<RhodiumToad>you can try it even without putting the invalidate-notifier back
<RhodiumToad>just protect every single hash-ref/hash-set/hash-remove call on this hashtable with the same mutex
<RhodiumToad>see if that fixes it
<RhodiumToad>(with-mutex) does the whole dynamic-wind dance. you might use the lower-level mutex functions instead, but I'd worry about thrown errors
<daviid>i'll start usig with-mutex, then improve later ...
<daviid>in this code, you would then protect in the definition of gi-closure-cache-ref (line 282), gi-closure-cache-set! and gi-closure-cache-remove! right?
<daviid>long time didnt use with-mutex ...
<RhodiumToad>right (as far as I can tell from just reading this)
<daviid>of course
<daviid>i bear the full responsability :)
<RhodiumToad>of course if I'm wrong then adding the mutex won't help and we'll have to take another look
*RhodiumToad still very interested in having working gui functionality from guile
<daviid>RhodiumToad: yeah, when this is solved, i'll work o signals, then g-golf becomes 'fully' usable
<daviid>should i define the mutex as a top-level variable?
<RhodiumToad>define it where you defined the hash table
<RhodiumToad>they are conceptually a single object
<daviid>RhodiumToad: bingo, it works fine now! spk121 thanks as well ... thanks both
<RhodiumToad>ok. so the immediate problem analyzed, I'm wondering if this is actually the best approach
<RhodiumToad>you said that glib doesn't provide a way to know when an idle source is destroyed,
<daviid>RhodiumToad: I wish the user wouldnt have to free the source and the gclosure, but although we can check if a source has been destroyed, there is o (very nfrtunately) g-source-destroy-invalidate-norifier
<RhodiumToad>but surely this is what GDestroyNotify in g_idle_add_full is for
<daviid>RhodiumToad: but, that is obly avalable to those bindigs that use g_idle_add_full, which g-golf can't
<RhodiumToad>why can't it?
<daviid>because g-golf is pure scheme, and closures are impemented using goops
<RhodiumToad>yes, and?
<daviid>but it is a glib design mistake really, i complained about this but ...
<RhodiumToad>ah. GDestroyNotify can only be a C function? not a closure?
<daviid>RhodiumToad: g-golf does not even call g_idle_add, not g_timeout_add functions
<daviid>yeah, it's complicated i don't have time to explain this to you now, but feel free to look at the code, (g-golf hl-api glib) and (g-golf hl-api closure)
<RhodiumToad>oh I see what you mean
<daviid>the thing is, glib does not provide g-source-add-invalidate-notifier
<daviid>which is terrible
<daviid>maybe i can write one mysekf, i'll see about that
<daviid>it's an idea actually
<daviid>GDestroyNotify 'has to be a C function' is not the roblem, marshallers also have to be C functions ... the proble is that without usig g_idle_add_full, there is no way to 'attach' a destroy norifier, afaict
<daviid>I mean these must be defined and 'respect' a C function template, if you look at g-closure-marshal and %g-closure-marshal
<daviid>the problem is even if i define a scheme procedure that respect the GDestroyNotify C def template, there is no way i can attach it to the source 'as it is' (but maybe i ca cook something in C i'll see about that)
<str1ngs>hello daviid, this was a pretty long thread. do you need me to test anything here?
<daviid>str1ngs: no, many thanks
<daviid>it's all solved now
<str1ngs>oh great, sorry for the slow response. I just saw your query now
<daviid>no problem, i was actually speaking loud, but lucky to receive help from li
<daviid>RhodiumToad: and spk121 to ..
<daviid>str1ngs: I pushed a fix, you should pull and run the make dance ...
<daviid>have to run, tx again, b tomorrow ...
<zig>hello #guile
<zig>new episode of gnu guile hacking where I debug a babelia:
<rekado>spk121: hey, I was going to upgrade the Guix package for guile-gi, but the “init_check” test fails.
<rekado>it’s the only test that fails.
<rekado>Do you have an idea why that might be?
<spk121>rekado: Hmmm. I don't know.
<spk121>It is a pretty simple test, but it relies on the characteristics of the Gtk typelib and library
<spk121>rekado: is your build environment console only? Gtk init_check may just be failing because Gtk thinks that there is no X11
<rekado>spk121: yes, this might be it. I’ll try to work around it. Thanks for the hint!
<stis>tja guilers!
<str1ngs>rekado: assuming this is a guix package. with nomad I use this hack which I snarfed from a package in guix.
<str1ngs>nomad also need to init gtk for tests. so maybe that will help.
<zig>I will just use wiredtiger memory backend as thread safe hashtable. That will lead to less lines of code, which is known to keep problems away. Bonus, I can reuse that very in-memory okvs (written in C) to store the enigmatic parsed query AND parsed texts. Much much much less lines of code.
<zig>and maybe performance will at the rendez-vous :)
<zig>if cache tuning is not involved.