IRC channel logs

2018-06-25.log

back to list of logs

<wingo>moo
<janneke>o/
<wingo>good morning civodul :)
<wingo>hum, it seems gnu lightning doesn't know about atomic memory access
<civodul>hello!
<civodul>oh
<wingo>we can call out to intrinsics written in c of course
<wingo>probably a reasonable strategy in the beginning anyway
<civodul>yes, that makes sense
<civodul>so you're already playing with lightning? :-)
<wingo>civodul: in my mind :) seeing what needs to be intrinsics and what not
<wingo>about 20 more intrinsics to go
<civodul>heh, that's exciting anyway!
<manumanumanu>Good morning everyone! Such a beautiful day. I finally finished my big garden project (building a 25m long brick wall for our flowerbeds). Now I want to do some coding
<manumanumanu>What are you all up to?
<manumanumanu>I also ordered plants for almost 15000SEK :|
<manumanumanu>I am building a watering system with guile controlling the pumps
<rekado_>manumanumanu: oh, that’s interesting!
<rekado_>I’m still working to get rid of the old lawn to replace it with growing beds.
<rekado_>really tough soil under dry grass roots.
<manumanumanu>I am more interested in the whole constructing a garden and letting my wife tend to it, but I am starting to get really into it. Apart from the hard labour of digging out 15 metric tonnes of dirt
<rekado_>I really enjoy gardening, though I’d love to automate irrigation.
<rekado_>I have been finding a rather large number of may bug larvae under the grass. I expect lots of may bugs next spring :-/
<snape>and if you automate irrigation you may have guile bugs as well as may bugs
<rekado_>:)
<rekado_>our garden is a little too far away from where we live, so I’d also like to set up some other kinds of monitoring.
<manumanumanu>I have 1.1k m2 of garden for our house. Too much is covered by a _huge_ patio that was here when we built the house. That is the one we now framed with flower beds. The next ting is to buy about 40 pallet collars. Those are pretty neat to grow things in
<manumanumanu>is there any way to start guile without .guile?
<manumanumanu>ArneBab: Sorry, there wont be any post-checks in a while. That is a pretty darn huge change that I'm not really sure I'm able to crowbar in without harming performance
<manumanumanu>pre-checks: easy. Post-checks: hard.
<manumanumanu>Pre-checks befor the body is executed are simple, since they can just be placed befor any inner loop, but post-checks have to be passed to the innermost loop, which I requires me to refactor a pretty hairy macro.
<civodul>wingo: code that tail calls does: (reset-frame x) ... (tail-call n)
<civodul>but "tail-call" itset also does RESET_FRAME
<civodul>*itself
<civodul>is it necessary?
<wingo>probably not!
<civodul>ok
<civodul>also, when doing reset-frame, isn't there a window during which sp may point above the "actual" sp, meaning that scm_i_vm_mark_stack could leave a few items unmarked?
<wingo>when reset-frame happens, that indicates that everything beyond the sp to which it resets is trash
<civodul>ok
<wingo>does that answer the question?
<civodul>yes
<wingo>generally, guile should mark everything inside sp and nothing outside sp
<civodul>sure
<civodul>it's not entirely clear to me that reset-frame is only used in those cases, but that's mostly because i don't fully understand this
<civodul>like i'm looking at the assembly of call-with-output-string
<wingo>ACTION updates his guile 2.2
<civodul>clearly the 'port' variable must be kept after the 'reset-frame' instruction so it can be passed to 'get-output-string'
<wingo>yes the disassembly is terribly confusing
<wingo>i have been meaning to annotate values with e.g. "f0" or "s0" to indicate whether they are relative to fp or sp, or rewrite them all in terms of slots-from-fp
<wingo>anyway the port is stored in fp[-1] which is fp slot 0
<civodul>right
<civodul>so even after RESET_FRAME(2), it's still markable
<wingo>i never see (reset-frame 2) before the tail call?
<civodul>i mean "tail-call" itself does RESET_FRAME(2)
<wingo>or are you referring to the RESET_FRAME inside the tail call inst
<wingo>yes
<wingo>yes, no problem there. at a point of a call, the closure/callee arg and the other args are all "inside" SP
<wingo>so, no problem there.
<civodul>right
<civodul>i have a case where this 'port' is zeroed out
<wingo>rly
<civodul>which suggests its references weren't added to the mark stack
<civodul>so i'm trying to enumrate the possibilities
<wingo>that could happen if the mark stack was moved and somewhere forgot to CACHE_SP() after a call that could have moved the stack...
<wingo>s/mark stack was moved/stack was moved/
<civodul>if the stack was moved? what do you mean?
<civodul>stack expansion?
<wingo>yes
<wingo>i guess that's unlikely though
<civodul>yeah, that's doesn't seem to be the case (i have a printf in expand_stack :-))
<wingo>:)
<wingo>i assume also we don't have any weird off-by-one in the MADV_UNUSED code that returns stack frames to the OS
<wingo>i just mention this for completeness of course
<civodul>MADV_DONTNEED isn't called in this case
<wingo>right, that :)
<wingo>ok
<civodul>another option is a race condition where the world would somehow not be fully stopped when we start marking
<wingo>i thought you were able to repro this problem even with disabling the precise stack marker
<wingo>is that not the case?
<civodul>yes, i disabled the slot_map thing
<civodul>but i left marking from sp to stack_top
<wingo>right
<wingo>so it's not the slot_map thing deciding to NULL out that slot (and even then, it nulls out with SCM_UNDEFINED i think)
<civodul>yes
<civodul>note that sometimes it's the stack slot itself that's zeroed, and sometimes it's a heap object (like this string port) that's zeroed, although it's referenced from the VM stack
<wingo>it's interesting because it's a reference from the stack; it's not prone to out-of-bounds writes from other adjacent objects stompling it
<civodul>right
<civodul>and i have "reasons to believe" that the zeroed scm_t_port was added back to a freelist because its first word looks like a pointer to the next item in the freelist
<wingo>is this on x86-64 ?
<civodul>yes
<wingo>you could try putting a memory barrier (before and?) after every write to vp->sp, if that's a concern......
<wingo>difficult to tell!
<wingo>you've never caught the marker in the act making an odd mark, have you?
<civodul>making an odd mark, like?
<civodul>in this case i would need to catch him *not* marking something it should mark
<civodul>*it
<wingo>civodul: hum! i guess i was thinking something was zeroing out that slot. but perhaps you were thinking that maybe the value was already zero
<wingo>the SCM value was zero and then was written to the slot
<wingo>is this particular form of the bug reproducible?
<civodul>wingo: it's reproducible to some extent: you let the code run for a couple of minutes, and you get a crash of one form or another
<wingo>and this form is a usual form?
<wingo>are threads being created all the time or are they constantly being created?
<civodul>it's the code at https://bugs.gnu.org/28211
<civodul>threads are created once for all
<wingo>it would be interesting if you could get backtraces when threads are stopped by libgc; i.e. where is the mutator when it is stopped by the collector
<wingo>if the mutator isn't making its changes visible to the collector, then obviously an error would occur, but later, in a seemingly unconnected way
<civodul>the problem is that most of the time we won't see anything interesting since it's non-deterministic
<civodul>wingo: it appears to crash much less frequently with brute-force memory barriers: https://paste.debian.net/1030656/
<civodul>like it took 30+ minutes before it would crash
<civodul>but i'm missing an __atomic_load in scm_i_vm_mark_stack, for instance
<civodul>so maybe it's just that
<wingo>that is a huge hammer to use SEQ_CST there...
<civodul>of course, but it's a simple way to test the hypothesis :-)
<wingo>:)
<civodul>now, what do we do with this, idk
<wingo>i think it's only a test if you can prevent the crashes
<civodul>it's expensive, so do we want to do this at each SYNC_IP/CACHE_SP? maybe not
<wingo>it could be the barriers just alter the timings without solving the problem
<civodul>could be
<civodul>really hard to tell :-/
<civodul>it could also be that all we need is a compiler barrier, so that vp->sp happens really where we write it
<wingo>right
<civodul>dunno if that instructions can end up being moved elsewhere
<wingo>of course walking through the disassembly of vm_regular_engine is an option ;)
<wingo>not one i would want to do tho in 2.2 :/
<wingo>i suppose you could look for every ALLOC_FRAME tho
<ArneBab_work>manumanumanu: would you like to write a short article about your for-loops for with-guise-and-guile? http://www.draketo.de/proj/with-guise-and-guile/ — to get started you can simply copy one of the org-files from https://bitbucket.org/ArneBab/with-guise-and-guile/src/default/
<civodul>wingo: fun fact: i've disabled ASLR and it's always the exact same VM (pointer) that's bogus
<civodul>it crashes in different ways, but it's always the same one
<civodul>it's the first thread in the 'all_threads' list (but not the main thread)
<wingo>that's promising!
<wingo>civodul: i ran into something interesting... it's possible for a thread to exit but the thread object is still marked
<wingo>while investigating something unrelated
<wingo>what happens if you add a /* Prevent this thread from being marked in the future. */
<wingo> t->handle = SCM_PACK (0);
<wingo>in on_thread_exit, right after removing the thread from the all_threads list
<civodul>ACTION looks
<wingo>i think there can be a case then that an exited thread might be marked
<civodul>though, it'd be a leak, but the problem here is the converse :-)
<wingo>and then in mark_stack(), there's that call to return_unused_stack_to_os, which from my reading might call MADV_DONTNEED on a mmap region that it doesn't control
<wingo>no, because when a thread exits, its stack is unmapped directly
<civodul>and i typically don't see MADV_DONTNEED in my tests
<civodul>i thought i'd share a backtrace: https://paste.debian.net/1030684/
<civodul>so here the port's 'read_buf' and 'write_buf' vectors are both corrupt
<civodul>they look like a freelist with the next-item pointers at regular intervals
<wingo>do you use soft ports at all?
<wingo>(could this be related to weak sets, in any horrible way? there is the port weak set)
<wingo>that would be sad :P
<civodul>i don't use soft ports here
<civodul>the port weak set, hmm, dunno
<civodul>it's a string port, so i think it's not added to the port weak set
<civodul>only file ports do, AIUI
<wingo>will it get a finalizer? i think that will happen only if it gets iconv descriptors
<civodul>string_port_type->flags is 0, so it doesn't have SCM_PORT_TYPE_NEEDS_CLOSE_ON_GC, so no finalizer
<wingo>yes but see prepare_iconv_descriptors
<wingo>incidentally i think the refcounting in release_port appears to be bogus; if the refcount is higher than 1, i don't think it ever gets properly decremented
<civodul>heh, 2nd bug that you find already :-)
<daviid>hello guilers!
<daviid>ArneBab: I just anwered your email
<manumanumanu>ArneBab: Sure thing. Not before this weekend though :)
<mwette> /quit
<rekado>davexunit: I’ve been playing with collision layers in tiled 1.1.5 and it seems that chickadee misinterprets the object coordinates.
<rekado>when rendering the collision rectangles they all appear to be flipped vertically.
<rekado>maybe that’s because of the “renderorder” property.
<rekado>mine is “right-down” and in your example game it’s “right-up”. I’ll change this in my map, but I guess I should contribute some tests for the parser.
<rekado>I take that back. It seems wrong also in the map that you use in your lisp game jam experiment.
<rekado>it looks fine in the tiled editor, but in the game (when highlighting collisions) they are all flipped vertically.
<ArneBab>manumanumanu: cool! Thank you!