IRC channel logs


back to list of logs

<Gooberpatrol66>is the discussion here in 2013-09-18 about pci arbitration obsolete since pci-arbiter has been added?
<damo22>would be good if Avadis Tevanian Jr was lurking around to help with the vm map locking :D
<damo22>vm_allocate returns with the vm_map lock in a read lock
<damo22>is that correct
<damo22>if map->wiring_required
<damo22>ah no it unlocks the map in Bailout
<damo22>Gooberpatrol66: we have solved most of the problems that would otherwise prevent virtio from working
<damo22>rump has virtio disk, we could use it
<damo22>the only reason its disabled is i had a compilation failure and had to remove it temporarily
<damo22>otherwise it would actually work
<damo22>although its not clear to me why its any different from having a working AHCI
<damo22>solid_black: hi
<damo22>solid_black: im starting to think there is no bug with the vm locking
<damo22>i am seeing all threads in W state with no vm lock calls
<damo22>just in cpu idle
<damo22>eg :
<damo22> 65 ./test.inf(786) (f61001e8): 8 threads:
<damo22> 0 (f64bf998) .W..NF 0xc10b40e0
<damo22> 1 (f61bfe08) .W.O..(mach_msg_continue) 0
<damo22> 2 (f5eed500) .W..N. 0xc10b40e0
<damo22> 3 (f5eed800) .W..N. 0xc10b40e0
<damo22> 4 (f5eed980) .W..N. 0xc10b40e0
<damo22> 5 (f67cc7e0) .W..N. 0xc10b40e0
<damo22> 6 (f5439368) .W..N. 0xc10b40e0
<damo22> 7 (f61f7350) .W..N. 0xf7de3f6c
<damo22>oh but the backtraces of sleeping threads do show vm stuff
<damo22>solid_black: i reprod with three threads
<damo22>./smp ./test.inf 3
<solid_black>cool, let me see
<solid_black>so one thread is still doing unwire
<solid_black>the second thread is just trying to lock the map (for vm_allocate, but that doesn't matter)
<solid_black>the first thread is wiring
<solid_black>are they wiring/unwiring the same pages again?
<solid_black>the VM regions are disjoint: f8892000,f8893000 vs f8894000,f8895000
<damo22>its wiring and unwiring the same thing?
<damo22>the first arg is the ipc_kernel_map
<solid_black>oh yes indeed, it is the same entry, great point!
<solid_black>can you see what that entry's actual start/end VM addresses are?
<solid_black>not sure what we'll learn from that though; it would match eirher one region or the other
<damo22>isnt the kernel supposed to not do this?
<damo22>ie, wire/unwire same object while the map is unlocked
<solid_black>yes, eaxctly
<solid_black>it's supposed to not do that, that's what "well behaved" refers to
<solid_black>but here it does that due to some bug
<solid_black>see, it really thinks it's wiring/unwiring two different regions of virtual memory, but somehow that ends up being the same entry / vm_page
<damo22>but the parameters it passes to the wire/unwire faults are the same
<solid_black>yes, it's the same vm_map_entry_t (for some reason)
<solid_black>and yet the address pairs passed to vm_map_pageable are different, can you see that?
<damo22>somehow the entry is being reused
<solid_black>so could entry lookup be bugged?
<damo22>yes the pageable pairs are different
<solid_black>I really really wish qemu supported rr with smp, we'd have this figured out a long time ago :|
<damo22>does anything check that the entries are unique?
<solid_black>there must be something wrong with how vm_map_pageable_scan args are displayed
<damo22>in my trace?
<damo22>kdb guesses how to print them from the stack
<solid_black>yes, there's not much to it on i386 (though I wonder how it works on x86_64, and what we could do on aarch64)
<solid_black>but vm_map_pageable_scan is a static func, it might be partly inlined or something
<solid_black>which explains why the stack doesn't contain what kdb expects
<damo22>yesterday i was getting proper backtrace in gdb
<damo22>it happened to stop when the program hung
<damo22>i cant reproduce now
<solid_black>could this (the entry reuse) be due to vm_map_clip_start?
<solid_black>it should be a no-op if it doesn't have to do any clipping -- but what if it does?
<solid_black>the implementation of clipping is ok, under the assumption that things are properly locked
<solid_black>but in our case, they're not, because the other thread has released the map lock while still referencing an entry from the map
<damo22>which line
<solid_black>vm/vm_map.c:1791 here in my tree, might be different for you
<solid_black>next to a comment that says "Pass 1. Clip entries, check for holes and protection mismatches"
<solid_black>it's not just clipping entries though... *lots* of things must be broken in the same way
<solid_black>for instance entry coalescing that I've implemented
<solid_black>all these things (rightfully) assume that the map is locked and noone else is looking at the entries, so we can change them
<solid_black>but this is not true in this one case
<solid_black>this could very well explain what we're seeing
<damo22>can we hack it to test?
<damo22>eg remove the unlock?
<solid_black>you could try to disable coalescing for example
<solid_black>make vm_map_coalesce_entry just "return FALSE;"
<solid_black>that wouldn't take care of clipping, but cleipping should not happen if things are not coalesced in the first place
<solid_black>but yeah, this sounds very plausible
<solid_black>and maybe try reverting dc0c9160173d2a5b40e9d6e29c1800102b17b1ec, ee65849bec5da261be90f565bee096abb4117bdd
<solid_black>because it also coalesces the objects
<solid_black>let me think of a proper fix though
<solid_black>can we just disable the "HACK HACK HACK HACK"?
<solid_black>a read lock on the map would be enough to prevent this
<solid_black>why does it do that? is that for perfomance reasons, or needed for correctness / avoiding some other deadlock?
<solid_black>hm, it says "unlock the map to avoid deadlocks"
<solid_black>what kind of a deadlock did they have in mind?
<damo22>wiring in the kernel map or submap
<solid_black>guess something somewhere (in vm_fault) could try to jsut allocate memory, and that would deadlock
<solid_black>so for one thing, vm_fault.c doesn't actually care about the passed-in entry being a proper entry in the map
<solid_black>it just needs uses address, end address, is_sub_map, object, offset
<damo22>can we remove the unlock hack, but do a read lock further down?
<solid_black>so we could, for instance, make a copy of the entry, and pass that copy
<solid_black>but this of course gets more complicated when there are multiple entries on the region
<solid_black>vm_map_copy_overwrite_nested? what does that have to do with this?
<damo22>can we copy the entry and overwrite it, then discard it?
<solid_black>that's kind of what I'm suggesting, make a temp on-stack copy of the entry, unlock the map, then pass the copy to vm_fault_wire
<solid_black>that should work as long as other stuff is indeed well-behaved
<solid_black>but then again, this only works for a single entry; how do you deal with multiple ones? alloca?
<solid_black>anyway, let me try implementing something like this
<solid_black>well, it didn't hang
<solid_black>I got "no more room in ipc_kernel_map"
<solid_black> /root/damo22s-test/test: Cannot get mach_port_names: (os/kern) resource shortage
<solid_black>but the system is alive & well otherwise
<damo22>what did you do?
<damo22>did you really fill up kernel memory?
<solid_black>perhaps I should boot with more ram?
<damo22>how much are you using?
<solid_black>though it's not physical ram, it's virtual space
<solid_black>but, like, the fact that it no longer panics (I added an assertion that there is only a single entry), and no longer hangs, and all the 8 CPU cores are busy, is reassuring
<solid_black>well s/no longer panics/doesn't panic/, it didn't panic before either
<solid_black>is there some other way I could test it, with some other test?
<damo22>./smp /bin/bash && make -j4
<damo22>in gnumach source
<damo22>you will need a chunk of memory
<damo22>actually dont run that
<solid_black>that would require me to mount the fs writable :|
<damo22>you can run stress instead
<damo22>see stress --help
<solid_black>I don't seem to have that installed?
<damo22>apt install stress
<solid_black>ah, let me reboot into the full system on non-smp then
<damo22>can you send me a patch
<damo22>i can test too
<damo22>my old test with openmp fails
<solid_black>oh great, my (non-SMP) VM locked up after "stress --vm 8"
<solid_black>fails how?
<solid_black>then post the traces please?
<solid_black>you mean the gsync one?
<damo22>maybe not a good test
<solid_black>stress --cpu 8 does hog up all the cpu cores
<damo22>root@zamhurd:/part3/test-ipc# ./smp ./test 7
<damo22>./test: Cannot get mach_port_names: (os/kern) resource shortage
<damo22>thats with no prints
<solid_black>yes, so same as mine
<solid_black>it runs a lot of successful iterations before that error, doesn't it?
<damo22>i made it not print anything
<solid_black>yes, but you should be able to tell by e.g. cpu load
<solid_black>and/or time it rakes
<damo22>do you think gsync_wait is still broken?
<damo22>thats my original openmp test
<solid_black>no, this looks benign
<solid_black>can you see who holds the lock?
<damo22>sorry its gone
<damo22>try running the infinite loop test
<damo22>the one you got a shortage on
<damo22>sometimes it hasngs
<damo22>no more room in c10b40e0 ()
<damo22>random init open failed
<damo22>rumpdisk: mach_vm_object_pages: Success
<damo22>/hurd/startup: Crashing system; essential task rumpdisk died
<damo22>startup: notifying ext2fs /dev/wd0s3 of reboot...done
<damo22>stress --vm hangs because it chews too much ram i think
<damo22>you need to dial down the amoung
<damo22>amount it uses per thread
<damo22>solid_black: i removed the HACK block altogether
<damo22>now it does similar behaviour to your patch but still breaks on gsync_wait
<solid_black>damo22: but that would deadlock if the vm_fault_wire handler wants to allocate some memory
<solid_black>maybe not for ipc_kernel_map though
<solid_black>but at least we now have a fairly good understanding of why the VM wire/unwire deadlock happens
<solid_black>the gsync deadlock, we should also take a closer look at, but clearly it's a different one
<damo22>does vm_fault_wire allocate anything ?
<solid_black>it might, who knows
<solid_black>vm_fault certainly does e.g. allocate a new object for a copy, though vm_map_pageable_scan tries to handle that before calling vm_fault_wire
<damo22>lets try to fix gsync
<damo22>how do i repair a git tree? i get bad object
<damo22>if i try to git blame
<youpi>damo22: git fsck
<solid_black>git fsck
<solid_black>...and faulting in the page from the object can allocate memory too, depending on what the object is
<damo22>why does gsync_wait lock the task->map but then remove objects from the kernel_map?
<solid_black>ACTION looks
<solid_black>that's the temp mapping it sets up
<solid_black>in case of a remote lookup
<solid_black>that not our case, is it? we should not hit the remote branch
<damo22>i showed Richard Braun my traces about the deadlock earlier and he has interesting feedback:
<damo22>(05:12:38 PM) damo22:
<damo22>(08:30:38 PM) braunr: hm it doesn't look like a deadlock
<damo22>(08:30:55 PM) braunr: the unwire call is blocked inside the critical section, isn't it ?
<damo22>(08:33:59 PM) braunr: yeah that's what it looks to me
<damo22>(08:34:29 PM) braunr: vm_fault_page blocks for some unknown reason, and the rest is just serialized after that operation
<damo22>(08:34:34 PM) braunr: it's not a locking issue
<damo22>(08:35:36 PM) braunr: it could be a missed event around thread_block
<damo22>(08:36:06 PM) braunr: which is the main reason i redesigned my version as thread_sleep, directly with the interlock as an argument
<damo22>(08:36:16 PM) braunr: that's the critical difference that allows not missing events
<damo22>(08:36:49 PM) braunr: it's the same idea as with condition variables, and why the mutex associated with a condvar must be locked when signalling
<solid_black>he's wrong
<solid_black>(look at me, having the confidence)
<solid_black>this is a locking issue
<solid_black>and the reason vm_fault_page blocks is beacuse is waits for the page to lose its busy bit
<solid_black>and the thread that owns the busy bit is waiting to lock the map
<solid_black>=> deadlock
<damo22>solid_black: but could it unblock if there was a missed event ? maybe there is something more to it
<damo22>solid_black: its possible the third thread is waiting for a missed event to unbusy the page that the others are locked on
<damo22>thats what braunr says
<damo22>theyre not all waiting on a lock
<solid_black>damo22: I really doubt it's anything else but the deadlock on map/page that I'm describing
<solid_black>you can't tell this from the trace alone, it indeed looks unclear what the unwiring thread is blocking on
<solid_black>which is why I added all the printfs yesterday, and connected gdb, and inspected it
<solid_black>it really is the case that the unwiring thread (which holds the lock) is waiting on a busy page, and the page is marked busy just prior to that by another thread, and that other thread is waiting in vm_map_verify() to lock the map
<solid_black>that all we knew yesterday; the question was why then would two threads both access the same page, if they supposedly deal each with their own VM region
<solid_black>and yesterday I actually observed two threads wiring and unwiring *the same* VM address range, which is super weird, and makes me think there could be more going on
<solid_black>but today, thanks to your observation, we known that in your trace, the two threads were accessing adjacent address ranges, but somehow the same entry
<solid_black>them accessing the same entry of course explains why the ended up with the same page
<solid_black>so then, why would they access the same entry? -- well, because of clipping/coalescing of entries, of course
<solid_black>hence, my patch, that tries to work around that
<solid_black>and evidently makes that case not deadlock
<solid_black>if gsync still deadlocks, that must be caused by something else
<solid_black>let me see if I can reproduce the gsync hang
<damo22>its easy to reproduce with 4 threads
<damo22>just run the old test
<damo22>the one with openmp and lots of printing
<damo22>i read the gsync.c code, i think the linked list and hashing should be ok
<damo22>i tried to assume it was broken and stepped through it to see if it did what i expected
<damo22>but i dont understand the locking side
<solid_black>I got a crash
<solid_black>as in, a Unix SIGABRT
<solid_black>"writing core file" is of course a lie, the fs is read-only :|
<solid_black>second attempt (with no change), still running & spitting out port info
<solid_black>no deadlocks, no crashes
<solid_black>it's seemingly not consuming multiple CPU cores though, judging by CPU usage
<damo22>thats because printing takes too long
<solid_black>maybe, yeah
<solid_black>still going
<damo22>mine hangs at around 8000 iterations
<solid_black>how do I know the number of iterations?
<solid_black>it spits things out so fast I cannot see anything
<solid_black>let me take a screenshot I guess
<solid_black>iteration 5598
<damo22>it stops at 10000
<solid_black>ok, let's wait some more
<damo22>if you get back to a shell, it completed without hanging after 10000 iterations
<damo22>did you run with ./smp ./test 4
<damo22>it should hang
<solid_black>for one thing, it doesn't iterate in order, so the current printed number is rather meaningless
<solid_black>it's done, exit code 0
<damo22>ive not seen that happen before
<solid_black>I guess my patch works wonders :)
<damo22>did you change anything, or same as before that you posted
<solid_black>same as before
<solid_black>just in case, yes nproc is 8, and I ran it as /root/damo22s-test/smp /root/damo22s-test/test-gsync 8
<damo22>i just ran it and it hung
<damo22>using your patch
<solid_black>ok, let's investigate then
<solid_black>get the trace, connect gdb
<damo22>hmm its probably easier if you reproduce
<solid_black>and I'll try to reproduce it again
<damo22>(10:12:30 PM) braunr: if there is enough memory pressure, the swapper could attempt to balance segments
<damo22>(10:12:36 PM) braunr: calling vm_page_seg_balance_page
<damo22>(10:13:45 PM) braunr: this function does set the busy bit to false, but doesn't wake up any thread beyond that apprently
<damo22>(10:14:46 PM) braunr: unlike say vm_object_copy_slowly
<damo22>(10:14:53 PM) braunr: see PAGE_WAKEUP_DONE
<damo22>(10:15:26 PM) braunr: i'd check whether the busy bit really still is true while the thread waiting for it is blocking
<damo22>(10:15:33 PM) braunr: if it's true, it's a logic problem
<damo22>(10:15:54 PM) braunr: otherwise, the thread has simply not been awoken
<solid_black>the busy bit is still set, yes
<solid_black>good point about vm_page_seg_balance_page not doing a wakeup (unlike PAGE_WAKEUP_DONE), does it not do that intentionally?
<solid_black>I wouldn't expect it to steal a busy page anyway, that sounds disastrous?
<solid_black>test-gsync 8 completed with 0 the second time too
<damo22>how fast is your machine
<damo22>mine is i7-3xxx
<solid_black>Intel i7-8650U (8) @ 4.200GHz (no idea what that means though)
<damo22>super fast
<solid_black>(this is Microsoft Surface Pro 6)
<damo22>youpi: is there some kind of problem we could be facing due to threads running completely concurrently that we would never have seen before, eg with system calls and glibc?
<youpi>damo22: with system calls themselves, probably not, since they're on the stack only & such
<youpi>the non-hurdish part of glibc, of course not since it's used widely on linux
<youpi>the hurdish part of glibc, possibly
<youpi>the translators, probably
<youpi>that's why I recommended testing progressively
<damo22>i gtg to sleep, its my birthday tomorrow :)
<solid_black>I got it to hang!
<solid_black>and if you're already gone, I'm going to have to debug this all by myself :|
<solid_black>ok, same as your case, all the threads are waiting for a lock on the map, it's unclear what holds this lock
<solid_black>this lock has a single reader and no writers
<solid_black>I don't know how to debug this further; none of the threads look like they could be holding the lock
<youpi>adding __FILE__ & __LINE__ ?
<solid_black>but that would only remember one of the readers
<solid_black>the last one to enter, not the remaining one
<solid_black>wait, no, gsync_wake() locks it first
<solid_black>before calling into gsync_prepare_key
<solid_black>that's asking for a deadlock
<solid_black>told you it was a separate issue!
<solid_black>lock_t doesn't let new readers in if there are waiting writers
<solid_black>so you cannot lock it for reading recursively
<solid_black>also, told you that gsyncwas broken :)
<youpi>not the way we were discussing ;)
<solid_black>it's broken in way too many ways :)
<youpi>that's still to explain
<youpi>I mean
<youpi>I *very* often hear people barf at something, just because they haven't actually understood how it works
<solid_black>it still blocks the current thread, instead of just delaying the response message
<solid_black>which is what my old patch tried to fix
<solid_black>that's my biggest gripe with it
<youpi>note that it currently works with a gsync_waiter on the stack
<youpi>so you cannot just get rid of the stack
<solid_black>my patch moved that onto the heap IIRC
<youpi>that's expensive
<solid_black>really, this is breaking rpctracing any non-trivial program
<youpi>(13:03:27) solid_black: lock_t doesn't let new readers in if there are waiting writers
<youpi>(13:03:34) solid_black: so you cannot lock it for reading recursively
<youpi>which writers are you thinking of?
<solid_black>it certainly is more expensive than putting things on the stack, but also not that expensive compared to making a syscall, looking up ports, etc
<youpi>ah, that affects rpctrace
<solid_black>wdym? in general, anyone trying to lock_write() the same lock
<youpi>then it might be useful to do indeed
<solid_black>oh course it affects rpctrace
<youpi>solid_black: sure, but they'll release it at some point
<youpi>solid_black: remember that this is already a long ago
<solid_black>who, the writers? they won't release it if they never manage to get it
<youpi>I have *WAY* too much on my plate to remember about everything that goes on
<youpi>solid_black: why shouldn't they never manage to get it ?
<youpi>I mean, if a read holds the lock for long, sure
<youpi>but that's not supposed to happen
<solid_black>sure, I don't expect you to *remember*, but it's only logical: rpctrace is the primary user of _make remote calls for something that a task/thread normally does to itself_
<youpi>it's logical when you think about it
<solid_black>so anything that breaks that, immediately breaks rpctrace
<youpi>on my daily work I'm mostly working on linux
<youpi>where strace is not affected by futex problems
<youpi>so, no, I wouldn't magically think about rpctrace
<youpi>which has indeed different concerns
<solid_black>yes, normally the outstanding readers should release the lcok eventually, but not if one of the readers is the same thread who now waits to take the same lock *again*, recursively, and is blocked on the writer!
<solid_black>really, see gsync_wake
<solid_black>it does vm_map_lock_read (task->map); first, and that succeeds
<youpi>"write" doesn't appear in the file
<youpi>remember that I'm currently at work, I cannot spend time on checking things and stuff
<solid_black>then it calls gsync_prepare_key() -> probe_address() -> vm_map_lookup(), while holding the read lock
<youpi>sure, if you have a read lock you cannot expect to be able to acquire a write lock
<solid_black>vm_map_lookup also tries to get a read lock
<solid_black>see the issue?
<solid_black>what if another thread calls lock_write() in the middle?
<youpi>ah so another read, not a write
<solid_black>another thread who wants to write
<youpi>ah, so a read then write then read
<youpi>(write being another thread)
<solid_black>now the nested lock_read() inside vm_map_lookup() is waiting for the writer to go away, but the writer is waiting for the original read lock to go away
<youpi>that was completely not obvious from what you wrote above
<solid_black>maybe short real-time messages is not the best medium for complex topics :|
<youpi>it isn't
<solid_black>or maybe I'm just bad at explaning my thoughts
<youpi>no, it just isn't
<solid_black>so, one way to fix / work around this would be to make gsync get a *write* lock, and then allow recursion
<solid_black>it's a shame, because it doesn't need a write lock, though
<youpi>I'd say I would really *not* be surprised that various code in gnumach has such double-read-lock
<solid_black>and yes, exactly, it's a more general problem
<youpi>and thus we might want to have some debugging tool to trace such errors
<youpi>I wonder if we really need to keep a read lock on the map
<youpi>and whether we could rather keep some lock on the object instead
<youpi>as the vm_map_lookup() way of working suggests, actually
<solid_black>sneek: later tell damo22: I reproduced and investigated your gsync deadlock, it's a completely unrelated (to the VM wire/unwire thing) issue, purely related to locking (readers vs writers) inside gsync
<sneek>Will do.
<youpi>vm_map_enter takes the object and not the map, after all
<solid_black>why does gsync_wake need to take the explicit read lock on the map at all?
<youpi>it was probably to make sure that the user doesn't release the memory in the meanwhile
<solid_black>it has to synchronize with gsync_wait, sure, but the loking inside vm_map_lookup would accomplish that
<youpi>but we can keep some lock on the object itself instead
<youpi>to be sure that even if the user releases its mappipng, the underlying memory is still there and we can still synchronize on it
<youpi>it's the memory object that matters in the end anyway
<solid_black>I don't think it matters, gsync_wake only uses the memory address as a key to find waiters to wake, it doesn't care about the underlying memory cell at all, unlike gsync_wait
<solid_black>(please point out if that's not so)
<youpi>I was looking at gsync_wait indeed
<youpi>which we have to fix too anyway
<solid_black>and the deadlock is inside gsync_wake
<solid_black>perhaps wait too, let me look
<youpi>wait will have the same issue
<youpi>wake can need to mutate the value
<youpi>when told so
<youpi>so it does care about the underlying memory
<youpi>but there as well it's essentially the memory object that matters
<solid_black>well, if the user is doing "serious" operations on the VM region concurrently (such as unmapping this object and mapping another one in its place), I wouldn't expect futex/gsync to behave in any nice manner (other than not crashing) anyway
<solid_black>ah, it can mutate the memory, ok
<solid_black>so yes I would say we care about the object
<youpi>it does make sense to see unmap concurrently with futex / gsync operations
<solid_black>but we need to convince ourselves that whatever new locking scheme we devise, it won't let a wake() slip in between wait() checking the number and going to sleep
<youpi>it poses a lot of problems in userland implementations of e.g. semaphores
<youpi>if you do sem_post(&struct->sem); free(struct);
<youpi>yes, it looks bogus, it kinda is
<youpi>but there are various cases where userland wants to do such things
<solid_black>how would that work?
<solid_black>the other threads are still possibly accessing it?
<youpi>another would be on sem_wait(&struct->sem); yes
<youpi>and not accessing anything else in it
<youpi>again, yes it looks bogus, I'm just taking the simple example to show my point
<solid_black>you need a happens-before between the other threads stopping to access the semaphore, and you releasing it
<youpi>you have the same kind of concerns with pthread_mutex_lock/unlock vs pthread_mutex_destroy
<youpi>it's *hard* to propperly implement pthread_mutex_destroy, making sure that the last thread doing pthread_mutex_unlock really has finished working on the mutex
<youpi>and then you'll be able to free it
<youpi>solid_black: again, I was just using a small dumb example to explain the kind of thing I'm talking about
<youpi>actual cases are much more involved
<solid_black>I understand that it looks bogus on purpose
<solid_black>I don't understand why it's supposedly not actually bogus and something that we should support
<youpi>if you don't want to take that there is a point to make but it's long to explain
<youpi>I'm not sure I'll take the time to explain
<youpi>=klmqsjd n$*
<youpi>qsdkl "fje'
<solid_black>but here, you know for sure that thread1 is completely done with it, because you've witnessed blah(&x->state) being true *after* you grabbed the mutex, meaning thread1 has already released the mutex
<youpi>no you don't
<youpi>you don't know whether pthread_mutex_unlock() is really finished with accessing the memory
<solid_black>ah, that, true
<solid_black>so your gsync_wake() could still be in process of completing when the vm region is already deallocated
<youpi>that kind of stuff happens, yes
<youpi>but if we keep a ref or lock on the object, we can still continue fine
<youpi>that's also important for cross-process semaphores and such
<youpi>sem_post(&shared_mem->p);; munmap(shared_mem); , I believe posix probably says it's fine
<youpi>and again, that's just a simple example
<youpi>you might have whatnot odd examples
<solid_black>so in other words, [you believe] that pthread primitive do establish a happens-before between unlock/post *being done accessing the mutex/sempahore* in a thread that held the mutex/semaphire first, and unlock/post *returning* in a thread that held it next
<youpi>not unlock/post for the latter, lock/wait for the latter
<youpi>and I'm not actually talking about that
<youpi>I'm just talking about the fact that memory can disappear under your feet
<solid_black>that'd be an even stronger assertion
<youpi>especially where you're talking about another task's memory
<youpi>? stronger?
<youpi>that's the whole point of synchronization primitives
<youpi>sem_post() in one thread, sem_wait() in the other
<youpi>you have a happesn-before between the two
<solid_black>I have a happens-before between thread1 *entering* post and thread2 *exiting* wait, yes
<youpi>ah, you talk about enter/exit
<solid_black>and you're kind of saying that I also have a happens-before thread1 *exiting* post and something in thread2, which (the latter) should at least be existing the *next* post
<youpi>I'm not saying that it is there, I'm saying that in various situations people need it :)
<youpi>IIRC glibc implements that in pthread_mutex/cond_destroy
<youpi>something like it makes sure that there is no unlock left
<solid_black>pthread_mutex_destroy in nptl really looks like it's doing nothing to me
<solid_black>other than returning EBUSY and setting kind to -1
<youpi>I don't remember the details
<youpi>but I do remember that it's tricky
<youpi>and glibc does some stuff about it
<solid_black>what we really care about is that a call to gsync_wake() can not happen in between gsync_wait() reading the value and going to sleep
<youpi>sure, but that's already handled by the hashed mutex
<youpi>no need to have interferences with map & object
<youpi>ok, see __pthread_cond_destroy instead
<youpi>it's even more complex than the example I showed above
<youpi>but again, I was just showing the kind of problem
<youpi>not going to take the time to write a reproducer
<solid_black>ok, so we lock the kmutex (/me wonders how that differes from locks, but whatever), then check, if it matches we insert ourselves as a waiter
<solid_black>makes sense
<solid_black>the locking on the map doesn't really serve any purpose then, does it?
<youpi>as I wrote above
<youpi>it's meant to keep the memory alive
<youpi>but I'd say we can probably replace that with keeping a ref or lock on the memory object instead
<youpi>since that's what we use to do the mem map
<youpi>ah but there is also the direct access
<youpi>which uses addresses
<youpi>so we indeed do not want the adderss to idsappear
<youpi>(and keeping a ref/lock on the object wil lnot be enough)
<solid_black>vm_map_lookup() mutates the map if it has to CoW or make a new VM object
<solid_black>so, vm_allocate a new page, call gsync on it with GSYNC_MUTATE, guaranteed deadlock
<solid_black>SMP or not
<solid_black>let me actually test this, sounds too ~good~ bad to be true
<youpi>I wouldn't be surprised at all that such a case exist
<solid_black>🤔️ that did not deadlock
<solid_black>ah, in the SMP build the locks are no-ops of course
<solid_black>non-SMP build I mean
<solid_black>(or are they?)
<youpi>depends which locks
<youpi>simple_lock are no-ops indeed
<youpi>but read/write locks as used by vm are not no-ops
<solid_black>yes, seems so
<solid_black>then why didn't it deadlock
<solid_black>there only are a handful of users of vm_map_lookup()
<solid_black>we could try to add a boolean_t map_locked argument like some other map AIPs have
<youpi>problem is: there are submaps
<youpi>see the loop inside vm_map_lookup
<youpi>possibly that can be handled somehow
<youpi>just not so trivially
<solid_black>yes, but that's not relevant for gsync
<solid_black>'cause that doesn't happen for userland memory
<youpi>why not?
<youpi>ah, userland memory doesn't have it?
<solid_black>submaps are for kernel only
<youpi>but we have to make sure of that
<youpi>with asserts etc.
<youpi>in the submap case, yes
<solid_black>maybe what we really want is the (sub)map being *left* locked by vm_map_lookup
<solid_black>that sounds cleaner
<youpi>I was wondering about that possibility indeed
<youpi>there could be a parameter for this
<solid_black>or -- do you think maybe we should use the existing vm_map_version_t/vm_map_verify/vm_map_verify_done mechanism?
<solid_black>it does relocking basically
<solid_black>yeah let's try that
<youpi>it'd mean a retry loop
<youpi>it looks more complex to me than just making the lookup leave the lock held
<solid_black>it is a lot more complex indeed :|